Measuring 
Educational 
Outcomes 

Fundamentals of Testing 






INC. 

Atlanta 



BRUCE W TUCKMAN 

Rutgers University 



Measuring 

Educational 

Outcomes 

Fundamentals of Testing 



® 1915 by Harcourt Brace Jovanovich, Inc 

All rights reserved No part of this publication may be reproduced or Irans- 
mitied in any form or by any means, electronic or mechanical, including 
photocopy, recording, or any mformation storage and relnetal system, with- 
out permission m writing from the publisher. 

ISBN 0 15-557692 5 

Library of Congress Catalt^ Card Ntimben 7S273 
Printed in the United States of America 
Cover art by Lisa, Jennifer, and Elena Noa 



To my grandmother, Hattie Goldberg 



Preface 


The field of tests »nd measuremeat is changing at a dramatic rate 
The influences of such new developments as criterion reference testing 
accountability and affective measurement are being seen and fell m 
classrooms and administrative offices Educators arc becoming more 
aware of the potentially constructive uses of tests m the decision 
making process Monitoring and certifying student progress are impor 
tatit ingredients m the change and lest publishers and other experts 
are meeting the challenge 

But for testing to contribute to effective education Us classroom 
uses must be realited by teachers because mator educational decisions 
are often based on the scores students get on tests that teachers con 
struct and/or administer Decisions about admission guidance, place 
ment and the awarding of scholarships, diagnosis of student learning, 
appraisal of new programs procedures matenals and equipment, 
program management national assessment of educational progress are 
all based on tests — some of which teachers themselves build and others 
of which are produced coinmercially and administered in the classroom 
Measuring, Educational Outcomes Te.i».w.g uk.«. 

written m response to this growing and contmumg need by teachers 
and other educational professionals be able to (1) construct their own 
tests (2) judge and evaluate the quality of tests (3) choose from 
among published tests and (4) use and interpret test results Consid 
enng the importance of tests it is amazing that supervisors supervise 
mstructmg but not testing that teacher traimng focuses pnmanly, if 
not exclusively on cumculutn and instruction with little mention of 
testing that teachers lesson plans may be filed although their tests are 
rarely viewed by anyone except the students who take them This book 



Preface vii 


should help fill that gap m the trainuig of preserace and in service 
teachers 

Because people tend to equate measurement wth statistics and to 
be wary of the difficult and relatively useless concepts they expect to 
encounter in a course in tests and measurement, this book uses some 
important techniques to make instruction about testmg both compre- 
hensible and useful Each of the fifteen chapters in this book begins 
with a list of Objectives, that is. intended learning outcomes, each 
chapter ends with a Self test of Profiaency to enable the student to 
measure whether he or she has mastered each objective The objectives 
have been prepared in shorthand form, and the test items have been 
built around the objectives (a minimum of two for each) to ensure 
their fit, or appropnateness This procedure is itself illustrative of prin 
ciples of testing described m the text Readers are encouraged to com 
plete the test at the end of each chapter, score it by using the answers 
provided at the back of the book, and review those parts of the chapter 
as necessary to overcome deficiencies 

Teachers will find that different kinds of tests or test interpreta 
tions are needed for different situations Two broad types, norm 
referenced (interpretation on a relative basis) and criterion referenced 
(interpretation on an absolute basis), are covered The latter provides 
more concrete information about what an individual student can or 
cannot do and, hence is more applicable to individualized instruction 
or any other student centered approach Since teachers’ own tests are 
used for criterion referenced interpretation, a Criterion referenced 
Checklist, presented in Chapter 11, suggests ways m which teachers 
can evaluate their tests in terms of appropriateness, validity, reliability 


interpretabiUty, and usability 

Writing this book has been an exciting experience for me I have 
tned to convey the concepts and practices of this changing and develop 
mg field in an open practical, and comprehensive way I have tned also 
to show how important testing is as an educational tool, one which, if 
reveal much useful infoimation that ultimately will 
benefit students By helping teachers to manage the learning process, 
tests can actually give them more independence and more time to be 
creative This book shows teachers how to wnte objectives and to wnte 
Items to measure those objectives, to measure students' knowledge as 
well as thinking skills attitudes, performance, and behavior, to check 
the adequacy of their tests and then improve the tests if necessary, to 
understand and use published tests, and Bnally, f use the results of 
tests to become better teachers , , . i, , , „ 

I would like to acknowledge the publishers who let me reproduce 
their copyrighted materials, my students who contmue to inspire me, 
my secretary, Sharon Davis, who continues to decipher me, my editors 
and reviewSs, particularly William J Wisneski and Louise Baer uho 
continue to assist me, and my family, who continue to tolerate me 


Bruce W Tuckman 



Contents 


PREFACE w 

AUTHOR’S Testing m Context/From Past Creations 
FOREWORD to Present Controversies xm 

PART ONE planning A TEST 

Chapter 1 Putting Measurement and Evaluation in Perspective 
Objectives 2 
Why Do We Measure"^ 3 
How Do We Measure'^ 10 
A Clanficaiion of Terms 12 
Orientation of the Book 14 
Plan of the Book 15 

Addinofia? Iniartnattan Sources 17 
Self test of Proficiency 18 

Chapter 2 Constructing Objectives 20 
Objecincs 20 
A Classroom Model 21 
What Are Objectives 24 
Preparing Objectives 25 
Evaluating Objectives 35 

The What and Why of Taxonomies 38 
Objectu es. Taxonomies, Testing and Teacher 43 



Contents lx 


Additional Information Sources 45 
Self test of Proficiency 46 

Chapter 3 Basing Test Items on Objectives 48 
Ob/ectnes 48 

The Relationship between Objectives and Tests 49 
Areas of Objectives 60 
Developing a Content Outline 67 

Additional Information Sources 71 
Self test of Proficiency 72 

PART TWO CONSTRUCTING A TEST TEACHER BUILT TESTS 

Chapter 4 Short answer Items to Measure Knowledge 
and Comprehension 76 
Objectives 76 

Types of Short answer Items 77 
Unstructured Format 78 
Completion Format 79 
True-False (Yes-No) Format 82 
Two choice Classification Format 86 
Multiple Choice Format 90 
Matching Format 100 
Choosing among Short answer Formats 105 
Additional Information Sources 107 
Self test of Proficiency 108 

Chapters Essay type Items to Measure Thinking Processes 110 
Objectives HO 

The Use of Essay type Items 111 

Items to Measure Application 111 

Items to Measure Analysis 1 14 

Items to Measure Synthesis 116 

Items to Measure Evaluation 1 19 

Items to Measure Combinations of Processes 122 

Criteria for Sconng Essay Items 1 24 

Inter rater Reliability 131 

Additional Information Sources 136 
Self test of Profictenev 1 37 

Chapter 6 Scales and Procedures to Measure Affective Processes 138 
Objectives 138 

Measurement and the Affective Domain 139 
Some Goals of Affective Education 141 
Constructing Scales 143 



X Contents 


Writing Attitude Statements 147 , ^ • tc-s 

Constructing a Likert and Two point Attitude Scale 153 
Constructing Adjectue Attitude Scales 157 
Constructing a Nominations Form 163 
Using an Attitude Scale 164 

Additional Juformalton Sources 166 
Self test oj Pro/icicitcy 167 

Chapter 7 Checklists and Scales to Measure Performance and Behavior 
Objectives 170 

Why Measure Performance and Behavior'^ 171 
Measuring Performance 171 
Deciding on a Performance Test 179 
Constructing a Performance Test 180 
Scoring the Performance Test 186 
Measuring Behavior 188 
Constructing a Behavior Rating Scale 196 
Using a Behavior Rating Scale 199 

Additional Information Sources 202 
Self test of Proficiency 203 

PART THREE EVALUATING A TEST 

Chapter 8 Test Appropriateness 206 
Objectives 206 
Why Evaluate a Test’ 207 
Applying the Criteria of a Good Test 209 
A Test Should'Be Appropriate 21 1 
Domain referencing 217 
What Makes Tests Inappropriate’ 219 
Appropriateness and the Teacher 223 
Additional Information Sources 225 
Self test of Proficiency 226 

Chapter 9 Test Validity 228 
Objectives 228 
IVhat Is Test Validity’ 229 
Concurrent Validity 232 
Construct Validity 235 
Predictne Validity 238 
'nterionValidit> 240 
determining a Test’s Validity 242 
Additional Information Sources 249 
Self lest of Pro^iency 250 



Contents xi 


Chapter 10 Test Reliability 252 
Objectives 252 

No Measurement Is Perfect 253 
Fn e Procedures for Assessing Re/iabiiity 255 
Sources of Test VanabiJity 262 
Strategies for Building Reliability mto a Test 266 
Using Test Results to Impro\ e a Test's Reliability 269 
Additional information Sources 275 
Self test of Proficteitcy 276 

Chapter 11 Interpretability and Usability of Test Results 278 
Objectives 278 
What Is Interpretability’ 279 
Norm referencing 279 
Criterion referencing 293 
Determining a Test's Interpretability 298 
Determimnga Test's Usability 302 
Selecting a Test A Summary 304 

Additional Infonnaiton Sources 308 
Self test of Proficiency 309 

PART FOUR USING PUBLISHED TESTS 

Chapter 12 Measunng Inielhgenceor Mental Ability 312 
Objectives 312 

The Concept of Intelligence or Mental Ability 313 
Test Items to Measure Intelligence or Mental Ability 325 
Types of Scores 329 
Properties of Tests 334 

Some Specific Tests of Intelligence or Mental Ability 341 

Tests of Creativity 353 

Issues m Intellieence Testing 354 

Additional Information Sources 357 
Self test of Proficiency 358 

Chapter 13 Measuring Achievement with Published Test Batteries 362 
Objectives 362 

What Do Standardized Achievement Tests Measure’ 363 
Companng Standardized and Teacher built 
Achievement Tests 375 

Companng Standardized Achievement and Intelligence Tests 
Administering a Standardized Achievement Test 383 
Interpreting Standardized Achiev ement Test Results 385 
Criterion referenced Achievement Tests 391 



•jdi Contents 


Measunng Achievement m a Selected Area Reading 400 
Additional Informatton Sources 404 
Self test of Proficiency 405 

Chapter 14 Measuring Interests, Attitudes, and Personality Onentation 
Objectives 408 

Measuring Interests and Career Orientation 409 
Using Existing Scales to Measure Attitudes 422 
Measunng Personality Onentation 429 
Using Published Affective Measures 438 
Additional Information Sources 439 
Self lest of Profiaency 440 

Chapter 1 5 Getting the Most from the School Testing Program 442 
Objectives 442 

A Teacher built Testing Program The Test Item File 443 
Overall Functions of Testing 444 
Individual Applications of Test Data 445 
Classroom Applications of Test Data 454 
Program and System Applications of Test Data 459 
Additional Information Sources 470 
Self test of Proficiency 471 

APPENDIX A / A Glossary of Measurement Terms 474 
APPENDIX B / Preparing Test Item Specifications 488 
APPENDIX C / An Example of Standardizing Scores 490 
APPENDIX D / Director of Test Publishers 493 

ANSWERS TO SELF-TESTS OF PROFICIENCY 495 

'REFERENCES 508 

INDEX / Authors 520 

INDEX / Subjects and Tests 523 


408 



Author’s Foreword 


TESTING IN CONTEXT 

CREATIONS TO PRESENT CONTROVERSIES 

me modern^story of testing is the history of testmg for mtelh 
Eence or mental abilitj Tracing the ongins of testuig mil help place in 
gence or menia issues and contro% ersies of measurement 

content •''= ptoomenon that tests ssere in 

femfdtSrt^^Svtd'urdSferen^c. in shiHs among adults 

Maskelyne— the astron because the latter Mas recordmg 

-^ismissed h‘s assis ^ telescope field eight tenths of a second 
fo7*d“enmf.trkfora’?^ 

to the nett ^ ,be Astronomical Observations at 

'5“3el-fte astronomer at the Komgsberg Obserfatory 
Greenwich and Besse Bessel compared ten obser 

rn Germany-read about . tml813^In«^___^^^^ 

vations of his teith tho by a tracuon more than one 

and found them to ‘'f® „5 ig23 yielded the personal equa 

second A second set o difference between Bessel s obsenations 

non A-B-1-223 sec malang subsequent comparisons on more 

and those of Arsclande By Bessel demonstrated the 

than one occasion with i, fluctuated from occasion to 

vanability in the perso BeTel s°d”is^”“nes astronomers tool to pub- 
occasion As a resuit 



jdv Author’s Foreword 


listang their personal equations along »uh the results of their 

“'’“?heus"e of the ehronograph s, minified the “7="' 
transu, because the astronomer only had to top a key at faction 
transit Hence, the personal equation was reduced to sitn^ reactio 
time’-a measure of the time required to react to a simple stimulus 
In 1863, Sir Pranas Oalton, a half-cousitt of Charles Darwin, began 
his systematic study of human individual differences His 
qmnes mto the Human Faculty and Its Development, published in 1883, 
has been regarded by some saentisis as the beginning of mental tests 
His tests included the ' Gallon whistle,' a device for determining a per- 
son's highest audible pilch, the Gallon bar." a measure of visual acuity 
and judgment, an, instrument with weights to measure muscle 
and a series of other apparatus type tests All these tests were intended 
to measure sensory or perceptual charactenstics 

In 1884, Gallon opened his Anthropometnc Laboratory to collect 
the charactenstic measurements of people Data for 9,337 persons were 
collected, including height weight arm span, breathing power, strength 
of pull and squeeze quickness of blow, hearing seeing, color sense, and 
other personal data (Boring, 1950) These data, however, yielded no 
noteworthy generalizations about individual differences other than the 
erroneous one that women are inferior to men in all capacities 

Gallon 8 emphasis on the insentory of human abilities as a means 
of classifying and understanding ' human nature" was perhaps exces 
sive and bore little fruit His concern with human physical and per- 
ceptual characteristics as measures of mental abilities was misplaced 
and misdirected we now know that physical and mental properties are 
often unrelated His focus on human measurement though, was an 
important step in the evolution of testing 

A contemporary of Gallon s J McKeen Cattell an Amencan psy 
chologist was also studying individual differences in primarily physical 
terms A list of Cattell s tests is shown below (Cattell, 1890} 

(1> Slrroglh nf as messured by a de^vx called a cfynamometer 

(2) Rate of movement the quickest time in which the hand can be 
moved through a distance of fifty centimeters 

(3) The smallesi perceptible distance betiieeti two points on the skin, 
known as two-pomt discnmtnation 

(4) The ainoum of praisnre necessary to cause pam by pressing a strip 
Of hard rubber upon the forehead 

I!? ihM“l’“' Terence ,n weight measured by requir 

mgthattwoweightsbehftedmsuccession 

m S® with which an individual can react to a sound 

(7) The speed wuh which an individual can name ten specimens of 

rai S'' il'l'™ colors arranged m haphatard order 

(8) ^e accuracy with wbreh an rndrv.dnal can Msec, a (if, yaien, .meter 



Author's Foreword xv 


(9) The accuracy with which an individual can indicate an interval of 
ten seconds 

(10) Immediate role memory 

These tests are considerably more "biological” than today s mental 
tests 


The Beginnings ot Individual Intelligence Testing. By 1904, Alfred 
Bmet had established himself as France’s premier psychologist and 
expert in human individual differences with is stu les o e 
en?es between "bright” and "dull" children Naturally when Pan 
school officials became concerned about nonleamers 
up special schools with simplified curnculums, h'y Wrned to B.n« 

^havr, KiSfiS^r. Ide^ral:; 

troublemakers might 3 , ,„a/asked to develop a pro 

children might escape '4""°''““ ,, ^ 1905, he and Theodore 

eedure for .dentifymg the t™ly ^ mtell.gence test " It 

Simon produced the Bmet . ’ .s-ending order of difficulty 

consisted ot thirty shorf “r ijS^ observed to be best m 

Bmet had discovered tha J’ vocabulary, and other traits 

judgment were also superior B , 

That IS, some children T ^jy areas of intelligence that 

chose his thirty tasks to rep intelligence by 

he felt reflected general ab.hty He re 

trial and error, s*«ing and rej^^ Tcvised the scale 

respondence to ,995 ^vision represented the first use ot 

in 1908 and again in 1911 me v determined by comparing a 

age norms in ,,„Be performance of children his or her 

child's performance to the average p 

own age Tcrman and his associates at Stanford University 

In 1916 Louis Terman a preparing the 

brought the “ 3* Simon Scales, which came to be known 

Stanford revision f „„ was so extensive as to constitute, in 

as the Stanford Bmet The r restandardized on an 

effect, a new test In ad ' j qoo children and 400 adults For 

American sample of “PP™ jq ^jed in a test, it being == 

the first time the .^d by chronological age multiplied by 100 

the ratio of mental age jpjy g^d again in 1960 and 1972 and 

The Stanford Bmet was administered test of inrelligcnce 

IS currently in use as sample, test administrators can determi 

Using the ddren vvho on the average, get a task right 

the age in _mnt on a subtest where he or she cannot sue 

When a child reaches a po ,s ,pe age in months of 

cessfnlly complete an item, ni 



xvi Authors Foreword 


Multiplying the ratio by 100 gives an mIelUgence quotien!, or /Q, ol luu 
considerecl to be the average, or normal, mtelligettce quotient for a 
child 


The Proliferation of Intelligence Testing When the United States 
entered the First World War, the government wanted to screen the 
thousands of men being inducted into the army so that misfits could 
be rejected and the remainder classified for different kinds of training 
and different levels of responsibility To accomplish this, a test known 
as Army Alpha was developed, it measured simple reasoning ability to 
follow directions numerical reasoning ability, and general knowledge 
Officers were found to score higher on the test than enlisted men, 
college graduates scored higher than those with only an eighth grade 
education These results led the test’s proponents to conclude that the 
test v?as valid and caused mdustnalists and educators to adopt vt 
enthusiastically Army Alpha was easy to give to large groups easy to 
score and easy to interpret However interpreting the Army Alpha as a 
measure of native intelligence’ independent of pnor education is a 
questionable practice It seems more reasonable to consider this type 
of test a measure of general scholastic ability or of verbal and numer 
ical reasoning abilities as Lee J Cronbach (1970) has suggested 

To counter the bias of Army Alpha toward language skills and 


experiences Army Beta the first nonlanguage group test was developed 
for illiterate soldiers and soldiers who did not speak English It was 
given to all men who fell below a certain score on the Alpha and was 
used chiefly to measure spatial orientation and perceptual speed and 
accuracy However because it emphasized speed and was patterned 
after the Alpha it produced results very similar to those of the Alpha 
Hence Us nonverbal feature did not eliminate from it all forms of bias 
In 1939 the Wechsler Bellevue Scale was published It tests the 
intelligence of persons from age 10 through 60 (Wechsler, 1944) Like 
Stanford Binet it is an individually administered IQ test but it 
lers from the Stanford Binet in certain important ways First while 
’ Stanford Binet is standardued primarily on children and is some 
lat limited m measunng adult intelligence the Wechsler Bellevue is 
andardized on a more adult papulation Second the Stanford Bmet 
exclusive emphasis on verbal intelligence but the 

T th.S lin'd V’ h'*‘ T'”” perfonnance tasks 

A perhaps the most important diHerence between the 
^'“''‘’riJBmet subtests are organised by per 
Jrmance ot age groups and the Wechsler Belte-rae subtests are Irsan 
ted by „=m type Page of th.s book I.sts the pertaSs 



Author’s Foreword xvil 


measured on each age graded subtest of the Stanford Binet The 
Wechsler-Bellevue is made up of the following item type subtests 

Verbal 

(1) Information (general knowledge) 

(2) General comprehension (indicating understanding) 

(3) Anthmetical reasoning (solving simple arithmetical problems) 

(4) Memory span for digits forward and backward (repeating a senes 
of digits heard once) 

(5) Similarities (stating the likeness that exists between nvo words) 

(6) Vocabulary (word meaning) 


Performance 

(7) Picture urrungement (putting pictures m the correct order to tell 

(8) ftctu7e completion (noting and nammg the missing parts m pic 

(9) '^ck design (putting blocks in a particular order to conform to a 

(10) Obj“ fas^mbly (assembling wooden pans to complete a specked 

(11) °aiu‘sUo> •«' 

was paired) 


a long, “■«>>' one Statement 
raent, it became famous mam y 

.onahle hypothesis that genetic factors are strongly 
.t intelligence difference The pre- 

implicated in the aserag ^ opinion less consistent ™th a 

ponderance of o,hes'i 5 than lyith a genetic hypothesis, which 

stnctly environmental hyputh« eouronment or its inter 

of course, does not exciuuc 
action with genetic factors 

riroused a considerable response m both 
Naturally, this statement a claiming 

professional and f pnmarily because whites 

that whites outperformed ,hey had been reared in more 

were genetically superio , claim, Jensen cited findings of 

favorable circumstances l w disadtantaged and minor 

performance equal. o that oMvhU« 

ity groups (eg. cmnesc/^ 



x\iU Authors Foreword 


Ravens ProgresSHe Matrices His antics contended that his data were 
poor his analyses questionahle and his conclusions equivocal 
^ One benefit of the debate was to make educators more careW m 
the use of IQ tests Clearly blacks and whites as groups do t end to 
score differently on IQ tests— perhaps it could be argued 

blacks have had poorer schooling Possibly then we should reconsider 
hoii tests should he used Except for the diagnosis of marked deh 
ciency ue would be wise to focus our testing on what children have 
learned as a result of school experiences and to draw conclusions on 
this basis rather than on the basis of variations in native intelligence 


Consider the following incident 

The General Aptitude Test Battery (GATB) is a type of broad based 
mtclliECTice test that the United Stales Employment Service uses m 
screening and counseling job seelcers It is also used as a screening 
device for apprenticeship training programs Because blacks tend to 
score lower than whites on this test they are eliminated from compe 


tition lor places in apprenticeship programs 

In the late 1960s while many civil rights groups focused their 
efforts on elimination of the test as culturally biased Ernie Green of 
the A Phillip Randolph Institute took a different approach He devel 
oped a program in Brooklyn s Bedford Stuyvesant section to tram 
potential apprcticeship candidates to take the test Assuming that white 
candidates hate had more specific training in taking such tests as the 
GATB he added reletani experience to the culture of minority candi 
dates to ovetoame the cultural bias Using mainly self-educated ex 
convicts as teachers he had black candidates spend six to eight weeks 
learning basic math and English They practiced on items similar to 
those found on the current form of the GATB (a review process not 
unUkc the College Board review or the Regents review m New York 
schools) 


Their heads filled with vocabulary words analogies and math skills 
Emic Green s first group of graduates took the exams The results were 
gratifying Mituahy all the students in the program were above the 
cut-off scores They rc beating the system shouted their critics who 
were trying to mamiam the status quo It s not fair to teach people 
to take a test But responded Green isn t that what the middle 
class school cxpcncnce is all about’ The courts agreed Those who 
haiL never been taught what the tests measure should be given the 
opportunity to Icamu ^ 

The results on tests may be the cause of school behavior and not 
imt its effect If children who have higher IQ scores are expected bv 
bar texebers to do beer sehool thao children 

tmd >f teachers arc meer m Sese 

vcLer, ‘k'' S''” 

experiences bm also their cause ' school 



Author’s Foreword xix 


In Summary. Testing can be helpful if its use increases the learn 
mg and performance of children Envisioning this, Alfred Binet— "the 
father of intelligence testing” — said 

A child's mind is like a field for which an expert farmer has advised a 
change in the method of cultivatuig with the result that in place of 
desert land, we now have a harvest It is m this particular sense, the 
one svhich is significant, that we say the mtelhgence of children may he 
increased One increases that which constitutes the inte hgence of a 
school child, namely, the capacty to team to improve mth .asirucoa 
(Bmet, 1909 ) 

It is possible that tests have been imbued with more powers than 
they LtuSy ^slss and used as an excuse for — 

,uo" David McClelland a 

achievement ““‘"'at™' “led occupational 

scores on College Boards and bo require the same kind of 

“-test games intelligence.' 

(McClelland, 1973 ) , shortcomings of tests, we must 

While we must acknowledg ^ variety of important 

also their shortcomings, they are the best 

functions in education Desp ,,^5 a reasonably objective 

means we have for *' ® of information about learners and 

fashion They help us gai leam 

learning that we need to help students learn 


Bruce W Tuckman 



part one/planning a Test 



chapter one /Putting 
Measurement and Evaluation 
in Perspective 


OBICCnvUS l Identify reasons for giving tests to students m 
the classroom 

2 Desenbe in nontechnical terms, how tests are 
constructed and evaluated 

3 Identify the meaning of some common terms 
used m describing tests 



^VHY DO \VE f 


Observations hai e a highly subjective quality They often represent 
what IS in the eye of the beholder rather than what actually exists 
The teacher and the scientist, however, must obtain information 
that IS accurate or veridical— n must reflect what is happening, not 
what we would like to happen Tests and other forms of measunng 
instruments are designed to replace subj«;tive judgment with 
objectivity to the greatest degree possible Without the detachment 
and impartiality of the well designed properly used test the 
teacher is left compounding and confounding judgment with judg 
ment, ne\er having any independent basis for assessing the behav 
lor of students (However, tl must be remembered that even tests 
can be misused to help make a point ) 

Sometimes we are faced with the necessity not of observing a 
behavior as it tends to happen but of stimulating the behavior to 
happen or finding out if the person is capable of performing the 
behavior in question In these situations, we must not only meas 
ure the behavior but, in a sense, sample it from among the indnid 
uaTs total repertoire of behavior For all of these purposes we use 
what are commonly called Usfs 

Because teachers are faced with the responsibility of record 
mg, measunng and evaluating the behavior and performance of 
their students, they will find tests a valuable tool With the aid of 
tests they can monitor student learning and diagnose strengths 
and weaknesses as they occur in their students 


A few case studies will help illustrate the ways teachers can use Some Case 
tests and^ive some indication of their importance in the teaching- Studies 
learning process 

• Miss Farragut, a first grade teacher, had a child in her class 
^vho was slow in reading and had difficulty m drawing common 
shapes Miss Farragut asked the school psychologist to test this 
girl He administered the Stanford Binet Intelligence Scale and the 
Bender Gestalt Test and she was found— based on the latter— to 
have a perceptual difficulty Because of this diagnosis the child 
was given special perceptual acmities by her teacher to permit 
her to learn more easily and to strengthen her performance in this 
area of weakness 



Measuremenl and E> alnatlon In Perspective 


. Mr Johns, the chemistry teacher, was concerned about let- 
ting his class use acids before they had mastered laborato^ ^ch 
mques and practices Rather than trying to use his own judgment 
to^determme when the class was ready, he decided to create a test 
Situation Everyone was assigned two Erknmeyer flasks, a tnpod, 
filtering equipment, and a burette Each was also given vials oi 
chemicals marked with the names of acids, bases, and other com 
pounds (Actually, the vials contained harmless substances ) in 
slructions for a laboratory experiment were also given Mr Johns 
had prepared a checklist of safe laboratory practices associated 
with the mock expenment Each of the twelve statements on his 
list were essential to laboratory safety As he walked around the 
room he wrote the initials of all students who performed a safe 
practice next to the name of that practice on his checklist At home 
that night he tallied the marks and identified those students who 
had demonstrated safe practices Each of these students was 
awarded a safety certificate and a button and was permitted to use 
the laboratory The other students could use the laboratory only 
when accompanied by a member of the safety group, who would 
use the checklist to judge their behavior in an effort by them to 
obtain the safety certificate too 


• Mrs Thomas was a fifth grade teacher who every spring 
administered a standardized achievement test battery to her class 
During the summer she carefully studied the class list of results 
that appeared on a computer print out supplied by the testing com 
pany She noticed that many of her students had done poorly on 
the language mechanics subtest Upon further examination of the 
mdiiidual records and a closer look at the test itself (a copy of 
which she kept in her files) she realized that her students had not 
clearlj understood the principle of using a comma to separate 
nouns m sequence She made a note to herself to include this topic 
in one of her lesson plans so that this deficiency would not reoccur 
in her next fifth grade class She also made a note to report this 
finding to her pnncipal for the benefit of the sixth grade teachers 


• Miss Lesmv was teaching English to high school seniors 
anti Morlting on her masters degree She had a theory that the 

toward hte if only in subtle ways For the research requirements 
In her master 5 gro^m and to test her insight as a tlcheT she 

°l’”' ’'5^*'’““ I’y collecting data She asked each 

of her students to wnte a composition about the most taSfratmg 



\Vby Do We Measure^ 5 


experience of his or her hfe, and scored each composition for (he 
presence or absence of each of six needs that seemed important to 
students of this age power achievement, affiliation, nurturance 
succorance. and autonomy Prior to doing this, she had checked 
needs” in the Mental Measurements Yearbook ^ A test called the 
Edwards Personal Preference Schedule that measured sixteen per 
sonal needs and used a short answer format was described and 
reviewed Miss Levine acquired this test and used it to measure 
students on the same six needs measured on the compositions To 
her delight the needs of students as measured on the Edwards test 
were quite like those measured on the compositions, even when 
another teacher rated the compositions Miss Levine had not only 
substantiated her hunch but now had some insight as well into 
the needs of her individual students 


• Mr Detwhiler, who taught automotive engineenng devel 
oped a test he called the Auto Troubleshooters Competency Test 
On one of the engines he had on hand Mr Detwhiler would locate 
or program in a fault or malfunction The student was then gnen 
the task of locating the fault and was asked to keep track of his or 
her time and also to keep a written record of e\erything he or she 
did If the student located the fault he or she was gi\en 100 
points One point was then deducted for each minute it took to 
locate the fault, and ten points deducted for each step taken that a 
qualified mechanic would not take m locating the fault Any stu 
dent who could earn sixty five points was 'certified” as competent 
on Mr Detwhiler's test 


• Mrs Shore was taking part in an expenmental program 
She was being trained to follow the principles of the British Infant 
School-Open Classroom approach in her third grade classroom 
The experimental program also included an evaluation to deter 
mine the extent to which it was meeting its goals Since one of its 
goals was to make school a more positive experience for young 
sters a measure called the School Sentiment Index was used All 
the students m Mrs Shore's class filled out this instrument so that 

their attitudes toward school could be determined Mrs Shore was 

pleased to know that children in her class scored high on this 
instrument relative to children in other classes It made her feel 


, ./ , » Yearbook is a detailed listing of wrtualfy all 

1 The Siwett Its tone of publication It also includes 

tests reviews The AWY currently in its seventh edition 

a972) IS edited by Oscar K Bitros and published by The CrsThon Press 



Measurement and Evaluation In Perspective 

that the methods she had been tiymg out were helping her to re 
late to her students 

• Mr Price was a sixth grade teacher For years he had been 
complaining about the use oE standardized tests to evaluate his 
students' progress in math, reading, and language arts "Of what 
value is It,' he was often heard to say in the teacher's room "to 
hnow whether our kids do belter than kids m Sante Fe, New 
Mexico, on tests that measure some things that we don't even 
teach ' Occasionally his arguments were stronger if less rational 
One day he decided to build his own test He made up a hst of his 
teaching goals in math, reading and language arts and circulated 
them to all the sixth grade teachers in his distnct He also asked 
some fifth and seventh grade teachers for their opinions of his list 
of goals When he had armed at a sufficiently acceptable hst, he 
began to make up test iiems to measure each goal After he had 
constructed two test items for each objective, he administered the 
Items to students and, based on the results, eliminated the items 
that didn't seem to be measuring what he intended to measure 
For the past two years now Mr Pnee's distnct has been using his 


Why Do We Measure^ 7 


test on a district wide basis at the sixth grade level m place of _ 
standardized test battery and there are plans to try the same 
approach at other grade levels 

• Miss Rodriguez taught art appreciation in a middle school 
She was a new teacher and in her teacher training had not been 
taught much about testing and had had little interest in learning 
about It Now, in her second year of teachmg, she became very con 
cemed about what her students were learning and how she could 
measure their learning She had thus far depended on her instincts 
but had come to lose confidence m her informal judgments She 
wanted to construct a good test but did not know how Fortunately 
for her, her district ran an in service workshop devoted to testing 
where it was suggested that she first construct her instructional 
objectives as a basis for designing a good test She tned this ap 
proach and subsequently found that it was easier for her to write 
test items 


Based on cases such as those above, we can list the following 
uses of tests 


(1) To Give Objectivity to Our Observations. As educators we 
are used to making observations Since we are concerned with 
shaping human behavior, we must constantly be observing it At 
times we evaluate the behavior we are observing in terms of a set 
of criteria or standards that may be unspecified and operate only 
within our minds These observations often lack specificity and 
exactness and in some situations that is not necessarily a problem 
However, there are occasions in all endeavors— education 
being no exception— when reasonably precise observations are 
needed Precision m observation refers to the accuracy with which 
we are able to capture a particular quality or component of the 
behavior before us To measure behavior objectively we need meas 
urine instruments We need measunng instruments that record 
behavior from a neutral vantage point so that we can apply our 
own standards and values in evaluating it 


(2) To Elicit Behavior under Relatively Controlled Conditions 
Can we judge student performance from homework’ From class 
work’ Such judgments must be limited by the many variables 
tZt opeite m fhese situations Were there distractions’ Was 
Jhf student given help’ Did he or she look up the answer m a 


Some Reasons 
for Tests and 
Measurement 



and E^-aluatlon in Peispeelive 


10 Mtasureraent 


"„d how reasoiahly .t .s .Merpteted S«ch cou.idorations are the 
subject of this book 


0 \VE MEASURE’ 


Because detailed infonnation on the techniques and procedures of 
measurement will be presented throu^out the book this section 
IS intended merely as an oxerview of the measurement process 
The first step m measurement is to decide what it is you want 
to measure Measurement requires a fairly precise set of goals or 
objectives that will guide the measurer m choosing his or her pro- 
cedures Trying to measure something that has not been clearly 
stated IS like trying to put together a jigsaw puzzle without first 
seeing a picture of what ii will look like when completed It is 
hard to know which pieces to combine unless you know how the 


end result should look 

Once you have your objectives you are ready for your measur 
mg instrument (within the limitation that all of your objectives 
may not be immediately measurable) You can either construct 
>our own instrument or use one already in existence (A third pos 
sibihty IS to adapt and modify an existing instrument — a combina 
lion of the two primary options ) Since test construction ordi 
narily is more time-consuming than stating objectives it may be 
advantageous to use an existing instrument if one is available that 
meets your objectives We are fortunate in having a compendium 
like the Mental Measurements Yearbook which lists most if not 
all of the commercially available tests along with pertinent infor- 
mation about each and cntical reviews of many A careful scrutiny 
of the AfAfK will often provide the test user with the names of 
instruments relevant to his or her objectives At other times mstru 
ments may be mentioned in professional journals and magazines, 
writing to the author will bnng you a copy of the instrument in 
question Another reliable source of information about existing 
tests IS the literature provided by the major testing companies * 
Obviously lest construction is more complex than test selec 
non Starting with objectives the test developer uses a set of rules 
such as those described m Part II of this book to develop test 

Items Such items maybe short answer or open-ended (eg, essay). 


* \ I5si of major lesl publishers appears In Appendix D 



How Do We Afeasure^ II 


they may involve paper and pencil or actual physical performance 
they may deal with what we know and think or what we like and 
now we fee! they may be designed for the student to fill out or 
for the teacher or other observer to fill out The test items are the 
critical mass of the test They are the controlled situations each 
aimed at sampling some aspect of human behavior Success in 
sampling will always be determined by how good the items are 
The test user cannot make valid judgments with a test that is 
imprecise or inaccurate The worth of a test is measured in terms 
of Its validity or appropriateness and reliability Validity and 
appropriateness address themselves to the question of whether the 
test measures what it is supposed to measure To establish validity 
it is often helpful to have some independent way of assessing the 
property that the test is supposed to measure Sometimes m the 
absence of any such independent cntena tests must be evaluated 
in terms of their fit to the objectives of which they are supposed to 
be a measure In this book we will refer to the fit between a test 


and its objectives as the appropriateness of the test Reliability 
refers to the test s consistency Whatever a test measures it must 
measure the same thing on each occasion it is used It must gi\e 
us as error free an estimate of the property to be measured as is 
possible Thus the test developer must not only write test items 
he or she must also evaluate them against certain criteria 

Tests must also have some basis for interpretation That is 
the test must provide the user with a way of evaluating the per 
formance of a person What does the person s score mean’ vhiat 
does It tell us about him or her’ The score itself cannot be con 
sidered to be the final product of a test It must be interpreted 
There are two ways of interpreting test scores One way is in terms 
of the scores that other people get on the same test We can talk 
about the fact that a score is higher than 65 percent of all the 
scores obtained on a test or we can talk in absolute terms about 
hoiv good a score is by virtue of some other criterion If a test rep 
resents what a person should know to become an accountant then 
we may demand that a person get 75 percent of it right without 
regard to how many or how few attain this level of performance 
At any one testing it is possible that no one may attain this pre 


designated^Ievel^ skdl of measurement also include test adminis 
tratjon To be able to measure you must be able to gite a test 


> We refer to the first kmd of test inteipretauun as ,,o,m r^eremmg aad 
the second kind of test interpretation as cnunon nferencmg 



,2 Measumnent and Evaluation to Perspective 


under controlled conditions and you must 

“uS:; .:Kf.KS.-is"* f si 

Lman beings particularly children must be aware of sens' 
tive to their nghts and undertake all necessary steps to protect 
them This protection is asitiuch a part of testing as the construe 
tion of test Items 


A CLARIFICATION OF TERMS 


Because the terms evahiatton measurement and testing are used 
throughout this book it is important that their meanings be clear 
Evaluation ts a process wherein the parts processes or out 
comes of a program are examined to see whether they are satisfac 
tory particularly with reference to the program s stated objec 
tives our own expectations or our own standards of excellence 
The assessment of a programs outcomes or results is facilitated 
by measurement In other words tests may be used constructs ely 
in the process of evaluation Essentially tests are tools that are 
useful in a number of processes such as evaluation diagnosis or 
monitoring 

The entire field of inquiry with which this book deals can be 
called measuremenl MeasiiTcrneni is a broad term that refers to 
the systematic determination of outcomes or characteristics by 
means of some sort of assessment device Testing is a less specific 
term that has typically been taken to mean educational measure 
ment More specifically a test can be considered to be a kind or 
class of measurement device typically used to find out something 
a tt vs \Vie kmi oi measuring device in 

which the person provides samples of his or her own behavior by 
answering questions or solving problems Similarly the words 
mvenrory questionnaire opimonnaire scale and the hke have 
been used to label measuring instruments in which a person pro 
vides answers to questions 


The word test has many unfavorable connotations that will be 
judiciously avoided m this text A test is sometimes taken to mean 

sense of that term For onr pur 
evice-one that the individual completes himself or herself aS 



A Clarification of Terms V 



Pssst! Want to get straight As'' 


contrasted with one completed by an observer, and whose intent 
IS to determine changes or gams resulting from a particular edu 
cational experience 

There are also the terms validity, reliability, and appropriate 
ness, which were discussed earlier in the chapter 

The term objective will appear with frequency An objective 
IS an intended outcome for learners as a result of certain expen 
ences Where possible, objectives are generally stated in observa 
ble, hence measurable, terms When so stated they serve as a 
description of the intended behavior of the learner rather than 
that of the teacher 

Norm referenced and criterion referenced are terms used to 
describe types of test score interpretation Scores from norm 
referenced tests are interpreted on a relative basis m terms of the 
performance of a "test ' or sample group (called a norm group) 
while scores from criterion referenced tests are interpreted on the 
basis of some absolute performance critenon such as, “does she 
know It," or “^can he do it ” Critenon referenced tests are built on 
the assumption that tests are fools that provide an accurate repre- 
sentation of absolute performance Often called proficiency tests, 
they are used to determine which objectives a student has ac 
quired competency m 



14 


Measurement and EsaluaUon In Perspective 


ORIENTATION OF THE BOOK 

The subject o! testing has usually been treated in such a highly 
lecliS tvay as to rLove U Erom the level of 
some teachers and thereby render it into the 
magic This situation is bad for at ieast two reasons ticst that the 
teacher will miss out on the many benefits of a testing program 
and second that testing programs will be carried out without any 
degree of teacher control Moreover e\ery teacher must do some 
testing and testing like other teaching techniques requires tram 
mg Thus the first tenet of this book s onentation is that testing 
must 6e presetited in a manner that can be understood that is a 
less technical manner 

Testing must have a purpose You cannot test nothing you 
must test something The onentation taken in this book is that 
{fie development of measurement instruments of any kind must he 
preceded by the preparation of o6;ectives In other words before 
a teacher can prepare a lest he or she must have decided what it 
IS he or she wants to measure that is what the objectives are 
The third point to be made is that testing is a tool that can 
help teachers help their students Thus usability and ease of inter 
prctation are important critena in test construction and selection 
And test results should be used as part of the instructional process 
Fourth feedback and evaluation as aspects of instruction 
often require testing as a source of data Needless to say account 
ability IS facilitated by the availabililj of relevant test data In 
structional approaches that involve individual student progress 
t>picaU> require testing as a means of monitonng student per 
formance Testing can help teachers identify aretis in which more 
emphasis is needed 

Fifth testing need not be restricted to those things that are 
easy to test When this happens the easy to test things tend to 
become the most important entena for evaluation to the exclusion 
of cquall> or more important entena Creativity (that is the for 
mulation of ongmal >et appropriate solutions) gives way to inteUi 
gcncc problem solving gives way to simpler forms of achievement 
based largclj on memory attitudes and feelings about self and 
school give vvaj to reading level This need not be the case Meas 
uixmeni can itself be creative enabling the more complex and iit 
mans cases more meaningful entena to enter the classroom 

Sixth tests need not necessarily be used to compare students 

u«M) comparatue mfonpation is 

■I'cM) Put more prw.tnelj tests can be used to determine the 



Han of the Book 15 


level and degree of performance of individual students by compar 
mg their performance against independent cntena For example, 
if a youngster can add two fractions and get the right answer on 
two occasions, vve might reasonably conclude he or she know s how 
to add two fractions We need not ahvays be concerned with the 
percentage of students nationally who have mastered this skill It 
is often sufficient to have determmed the degree of mastery of an 
individual child as a basis for determinmg whether that child is 
ready to proceed to new learning 

Seventh, ies/s must he consistent ttUh the kinds of instruction 
and the kinds of learners they are being used to evaluate New 
forms of instruction such as individualized instruction and open 
education require tests that are consistent with the goals and 
objectives of these programs Students with different backgrounds 
or different learning patterns often require different kinds of tests 
to measure their capabilities An hourglass would not be an effec 
tive instrument for measuring the acceleration of a jet plane Mod 
em instruction requires modem testing 


PLAN OF THE - 

The book is organized into four parts Each is described briefl) 
below 


The first step m testing is to plan the lest you wdl need This is Part One/ 

done by specifying your objectives to be measured and then rclat Planninga Test 

mg these objectives to specific test items Before you can unte or 

select test items, you must prepare objectives Accordingly, the 

first part of this book deals with the actual procedures for writing 

objectives and for relating these objectives to test items 


Basically there are two ways to obtam a tcst-constroct it or Hnd 
It It It IS unique to your needs, then y-ou bate no choice but to can 
street It This section of the book will desenbe how to construct 
tests in different areas and of different Wes paper and pencil 
tests perfonnance tests, altitudinal tests, salue tests, tests of s«ial 
strecture short answer tests, essay tMts. checklists, question 
structure, ^ classroom 

naires, an I^^’^rovided on how to build, use, and inlcrprcl 

Jhte“nTk!ndsortes.s to sene a wide snrie.y of needs 


Part Two/ 
Conslrucllng a 
Test. Teacher 
bul/t Tests 



16 Measurement 


and Evaluation In Perspective 


Part Three/ 
Evaluating a Test 


A lest IS a tool tor evaluation that must itself “ 

tested for its suitability as a prerequisite to using it Wnhin this 
pmeess orevlating orWg a test s A. 

questions may be asked as aids or tools to assist m the judgment 
(1) Does It meet my objeetives-is it appropriate^ (2) Does t 
measure what I am using « to measur^is it ^ 

measure that consistently and accurately— is it reliable'’ (4) Does 
It provide results that can be understood and appUed~is it inter 
preiable’’ and (5) Is it reasonably easy to administer— is it 
In order to answer these questions one must understand and 
be able to apply skills and concepts such as that of appropriate 
ness types of test validity standard error of measurement reh 
ability coefficients norm referencing standard scores grade 
equivalent scores and criterion referencing These and related 
concepts are presented m Part Three along with their means of 
implementation not only for teachers to evaluate their own tests 
but also to understand the evaluation of published or standardized 
tests that follows 


Pan Four/ 
Using 
Published 
Tests 


The aUcmaiive to building a lest is finding one Often teachers 
are called upon to administer and interpret tests that others have 
selected When this is the case teachers may be at a marked dis 
adianiage To overcome or offset this disadvantage this book will 
deal with existing tests in a wide variety of areas such as achieve 
mcnl intcUigence (or mental ability or aptitude) reading interest 
personality and so on In addition to describing the major tests 
available in each area descnptions of the concepts themselves and 
what they mean are offered along with some bases for choosing 
between alternative tests The administration and interpretation 
of the major tests in the vanous areas is also covered 

Testing programs and procedures can be used for a variety 
of purposes in the school such as assessing student progress pro 
viding student feedback evaluating instruction and determining 
whether the districts program is thorough and efficient The last 
chapter of this Pan wiU draw upon materials covered m parts One 
programmatic uses of tests so 
that their procedures and results can be better understood Appl, 

SsiSi in administrators will be outlined with 

Ts^TcMudo W “ “'1 P" => 



Plan of the Book 17 


Because the ideas about testing and test item writing pre 
sented in this book are organized around measurable objectives 
It will be necessary for the teacher first to understand and be able 
to write objectives Therefore, before we turn to test item con 
struction (Part Two), we will deal with the matter of writing 
objecti\es 


AddlUonal Information Sources 

BareIa>,J R Control ers.el issues m testing Boston Houghton Mifflin 
Chauneey, H i Dobbin J E Testing Us place ,n eiucaUan taiay 

NY Harper* Row, 1963 ^ r Educational Leadership 

Ebel, R L. Measurement and the leacne 

1962,20,2(1-24 m R Ebel (Ed) Encyclopedia of editca 

Michael W B P«d.c tie In R I- Ebel (^_^ 

PaynTn r^eteel^eation -d -neosiire-nent o/ teaming onteo, nes 
Waltham hlass BIaisdell Pub « evaluation Theory and 

'"’°Td'^e Uhrgm”0h,o CbarlesA Tones 1922 



18 Measurement and Evaluation In Perspective 


SelMesloi Prottciency 

(1) Mr Carlson tests all ol Ws students the tirst day of class Which of 
the following reasons for testing would be unsuitable? 

B to get an indication of a student’s proficiency 
b to assign grades 
c to plan class assignments 
d. to diagnose a specific deficiency 

(2) Observation is sufficient to Judge the extent of a person’s knowledge 
or the depth of his or her feelings 

TRUE FALSE 

(3) What la the first step In rneasurement? 

(<!} The final product of a test Is a (an) 
a series of items 
6 score 

c Interpretation ol a score 
d performance 

(5) In this book, the fii between a test and its objectives is called 


{6) Crlterion-relerenced tests are used to determine 

a how a student’s performance compares to that of a sample 
group 

b whether a student likes coming to class 
c how successful fie or she will be in college 
d which objectives a student has acquired competency m 


(An,«,. ,0 ,11 Self t«B „f begin on pag, 495 ) 



chapter two/ Constructing 
Objectives 


OBJECTIVES I Describe a classroom system model and ihc role 
o£ objectives in u 

2 State purposes for objectives m education 

3 Identify the three different parts of an objectnc 
namely action conditions and criteria 

4 Construct measurable objectives containing the 
three different parts 

5 Identify and describe criteria for evaluating ob 
jectives and the application of these cntena 

6 Use cognitive and affective taxonomies to classify 
objectives 



A CLASSROOM ^ 


^Ve can conceive of a simple model of the classroom as shown in 
Figure 2 1 below 

If we consider the classroom as a system * our description of 
It must include a statement of its goals and objectives The follow 
mg may be some of the goals of the classroom system 

to del elop the reading and writing slalls of students 
to help students gam knowledge of math and science 
to enable students to develop and clarify \alues 
to facilitate the development of each student s self-concept 
to help students acquire more positive attitudes toward school 
to enable students to develop independent study skills 

Each of these objectives can in turn be broken down into 
other, more detailed objectives, but as thej are they help us 
define the classroom as a system If we want to measure the sue 
cess of the system m meeting these objectives we need some meas 
urement instruments but if we select measurement instruments 



The Classroom As a Simple System 


Figure 21 


1 rh^rrhman ( 1969 ) uses the following charactenstics to desenbe a s}stem 
S SSes Its performance measures its constramts or limits its 
Souris Its components and their funcUons and its management 


21 






22 


Conslrucling Objectives 


successes or failures 


Applications of Consider the follotcing occurrence in the hght of the systems 
the Model model 


• Miss Logan rvanted to teach the students m her ninth grade 
social studies class to be better consumers She was interested in 
having them deteJop the knowledge and skills to be able to idenuiy 
the characteristics of a product and evaluate the quality of that 
product She was also anxious to see them buy items of better qual 
itj and greater usefulness and be willing to spend the tune shop 
ping around to get something at the lowest price without sacrific 
ing quality Her last objective was that they would learn to cnti 
cally evaluate commercials and advertisements She did not know 
how to help her students learn these things so she began making 
inquiries 

Her inquiries brought a unit on consumer education to her 
attention and a game called Consumer She was able to convince 
the high school pnncipal to purchase the consumer education unit 
and was also fortunate enough to borrow the game She set aside 
a portion of each class period to try out the unit and devoted two 
full class periods to the game After both the unit and the game 
had been used she gave her students a kind of take home exam 
They were to research and evaluate some products and to judge 
and critique some ads She also gave them a record sheet and 
asked them to keep track of their purchases and to evaluate them 
for cost and quality using a procedure described m the unit When 
ihe take home exams and record sheets were turned xn. Miss 
logan studied them carefully They told her that her students 
were demonstrating the skills of making critical judgments and 
evaluations, but that their own buying behavior did not reflect 
these skills I think next time I had better concentrate less on 
what to do and more on getting the students to do it/' she said to 
herself 


Miss Logan was functioning as a systems analyst without 
knowing It She had established some objectives for herself and 
Identified a subsystem that she hoped would meet these objectiv es 
Her subsystem Iwd two components (1) the unit, and (2) the 
teacher she managed the system She developed meas 
urcs ol the systems outcomes that she could then use to tell 



A Classroom Model 23 


whether the system had met her objectives When she saw that 
some objectives were not met, she decided to make certain changes 
in the system 


The point to be made here is that measurement occurs in a 
context, measurement occurs with respect to something, measure 
ment plays an educational role Those elements that connect meas 
urement to the classroom system are called objectives 
Another example may provide further clarification 


• Kitty a fifth grader, was a poor student m math She 
seemed to be falling further and further behind the class Her 
teacher, Mr Washington, tried giving her supplementaiy material 
but It was ineffective in altering her performance Finally, Mr 
Washington said, "I had better look at math more systematically 
He sat down one day and began mapping out all the things that 
students must learn before learning to add “ 

of fractional expressions-a skill Kitty couldn t master He ended 
up with a list of twelve items 

( 1 ) adding 2 dissimilar fractions 

(2) subtracting 2 dissimilar fractions 

(3) expressing mixed numbers as improper tractions 

(4) expressing improper fractions as mixed numbers 

rsi sudolvine fractional equivalents 

fj) identlfyig the lowest common denominator 

(7) identifying common multiples 

(5) adding 2 similar fractions 

(So’) of whole numbers 

(11) identifying multiples 

(12) dividing with a remain er 

* a test which included items to measure 

He then s his math system After school 

each of the c P scored it 

one day he ga™ not know how to identify common 

He discovered *at Mfy^^o into both 9 and 12 evenly’ 

multiples (o g . what ^ subtraction of fractions was actu 
answer 3) ePj n understanding division Because 

ally based on her multiples, she had difficulty m find 

she could not denominator of two fractions and hence 

mg the lowest o°™ „ „ them Mr Washington had thus found 

adding them or "“h'ram ® ,he missing piece so to speak, and 
the weak link in tne sy»i 



24 Constructing Objectives 


could .hen conc=ntr..e on teachmg ICtty how to overcome .h.s 
problem 


In the above example the teacher found it necessary 8="" 
ate a list of obiectives before he could use the of m^s 

urement to identify the basis for a learning difficulty Kitty 
teacher could not measure Kitty s mastery of skills 
until he could first specify what these skil s should he Again 
objectives served as the link between the learning system and 
measurement of the characteristics and quality of that system 
This brings us to our next question What are objectives 


WHAT ARE OBJECTIVES’ 


Many adjectives have been placed before the v/ord objectives 
including instructional behavioral performance measurable 
expressive terminal and enabling An objective will be defined 
here as an intended outcontc stated in such a way that its attain 
ment (or tack of it) can be observed and measured We might 
broaden this definition by saying that an objective can be the 
statement of an intended or prescribed characteristic although 
when used in relation to instruction those intended characteristics 
typically represent intended instructional outcomes Gagnd (1974) 
considers an instructional objective to be an expression of a learn 
mg outcome ui terms of human performance including a specifica 
tion of the situation in which it IS to be observed Bloom Hastings 
and Madaus (1971 p 20) say that a statement of an objective is 
an attempt by the teacher or curriculum maker to clarify within 
his own mind or communicate to others the sought for changes in 
the learner Mager (1962 p 3) says that an objective is an intent 
communicated'by a statement describing a proposed change in a 
learner — a statement of what the learner is to be like when he has 
successfully completed a learning experience It is a description of 
a pattern of behavior (performance) we want the learner to be 
able to demonstrate 

Objectives so defined may be reasonably called instruclional 
objectives since they represent the goals of instruction (More 
over such objectives can be labeled as measurable or behavioral- 


die word “ lieliavioral demonstration m the usual sense of 



Preparing Objectives 25 


— relatively interchangeable terms — since they specify outcomes 
m observable form ) For the most part, the tests that teachers 
construct will represent an attempt by them to assess the attain 
ment by students of their instructional objectives Test objectives 
represent those performances or characteristics that a test has 
been designed to measure When tests are used to evaluate the 
effects of instruction, test objectives will be represented by the 
instructional objectives for the units of instruction being eval 


uated 

Objectives have many values for the teacher as instructor, 
primary among them being to aid the teacher as tester Objectives 
help you to determine what It Is you want students to learn and, 
hence, what you want to measure, and suggest further how to 
measure what you want to measure, whether students have 
achieved what is Intended, areas Jn which your instruction has 


been successful and unsuccessful. i /u , 

In the area of teacher built tests that are used pnmanly (but 
not exclusively) to measure achievement, objectives are a neces 
saiy starting point Before prepanng test '■ 

befLe preparing instruction), objectives should be prepared 

serve to guide hod. mstriletiou and the eon 

stniction of achievement tests explicitly stated objec 

Published tests have,^tai,ed content outline These objectives 
tives and if not these t ^ published tests by examining 

?h\m"t iTtlirtrlSr own cbiectives ^e^^ own objec 
tives are thus of assistance even in test selection 


PREPAMNG 

When an objective is wntten in full or detailed form it has three 

"“'‘m me action or behavior that the .earner or test Udier is to 

perform under which the action or behavior 

(2) the conditions or give 

<3, reHtr^b^^ebThe^o"- o^ bebaidor is io be Judged 



26 Constructing Objectives 


Action 

Statements 


Rgure 2.2 


Let us concentrate initiaUy on the first component, the action or 

SZ an/f'’ °f “■e 

jecttve and for shorthand purposes may be used as the state- 
w “ objective by itself.' The emphasis is on the word aclioii 

far s a He The action or behav- 

Zame ‘*''"™P"ve Statement of intended student performance 

faker --esponse by Te Z 

iS:t:rhrr.tfexZ;tZcT^^ 

whether a student can we can observe and measure 

thing; hence, the emphasis ^^tnoi:strale some- 

tions or measurements we can ih ^t'om these observa- 

stands or appreciates under- 

the vehicle for this kind of intireZ. ™==‘®“^o"'onts are 

live. Figure 2 2 prZteZ lengthy hft of 
taxonomies of the cognitive aL affL V ^ j'*' ‘ho 

described later in thU kaptw domains, which will be 

number of usable verbs. ^ ^ ‘h'® figure the large 

Taxonom/es of the 

' COGNITIVE DOMAIN 

’""in'lZlecl.ttte ''‘"""'y' list, match, name, out- 

extend.ZZalS" 3,'»e etamplU'’"',"’^''"'’' explain, 

summarize ^ P^'^aphrase, predict, re- 



Preparing Objectives 27 


0 Synthesis categorize, combine, compile, compose, create, design, 
devise, rewrite, summarize, tell, write 

I. EvaluBhon appraise, compare, conclude, contrast, criticize, de- 
scribe, discriminate, explain, justify, interpret, relate, summarize, 
support 

II AFFECTIVE DOMAIN 

a Receiving ask, choose, describe, follow, give, hold, identify, locate, 
name, point to, reply, select, sit erect, use 

b Responding answer, assist, comply, conform, discuss, greet help, 
label, perform, practice, present, read, report, respond, select, 
tell, write 

c Valuing complete, demonstrate, describe, differentiate, explain, 
follow, form, initiate, invite, join, justify, propose, read, recognize, 
report, select, share, study, work, write 

d Organizing adhere, after, arrange, combine, compare, comp\e\e, 
defend, explain, generalize, identify, integrate, modify, organize, 
order, prepare, relate, synthesize 

e Characterizing by a value or value complex act, discriminate, dis- 
play, influence, listen, modify, perform, practice propose, qualify, 
question, revise, serve, solve, use, verify 

A second, considerably shorter, list has been compiled by the 
American Association for the Advancement of Science (1965) This 
hst—with definitions and examples— is shown in Figure 23 A 
third, even shorter list, appears beiow This hst— based on the 
second— provides five action verbs, one of which can be used for 
writing any objective This list also includes a description of the 
activity used to measure the attainment of each type of action 
Identify— given a stimulus array, the student can point to (by rec 
ognition) the specific stimulus required by instruction 

Distinguish— given two potentially confusabie stimuli, the student 
can point to (by recognition) the one possessing the specific, pre 
designated property 

Describe— given an object or concept name, the student can state 
(by recall) those characteristics of the concept or of the object in 
a manner sufficient to "describe'* it m accordance with its defined 
properties 



28 Constructing Objectives 


Constnict-gwen the name of a manufactunible object or con 
cept and sufficient eqmjtment for doing so the student cun P 
duce the object or concept tn a tvay consistent with its defined 


properties 

Demoastrate-given a problem or performance reduest and rfl the 
necessary elements for completing it the student can cany out 
a procedure sufficient for attaining the required performance by 
virtue of Its conformity to defmed rules and resulting m le 


appropriate outcome 

The advantage of working with such a short glossary of verbs as 
appears above is that once the verb is chosen, the method of meas 
urement follows automatically 

Now that the words to describe measurable action hate been 
presented it may be helpful also to present the words that dc 
scribe inferences hunches and hopes rather than observable acts 
Chief among these are the following 


understand be aware of 

know be sensitive to 

appreciate 


As instructional goals the above are vital but as outcomes they 
cannot be directly measured They must be inferred from some 
performance or behavioral act For example we may want stu 
dents to understand that many people in the world live in condi 
tions of poverty We can ask them to describe such living condi 
tions or to demonstrate through statistics the extent of poverty in 
the world but we cannot ask them to understand it If they can 
describe it or demonstrate it we may infer that they under 
stand It 


Rgure 2.3 Nine Action Verbs for Use m Writing Objectives * 

DEFINITION OF ACTtOM WORDS 

The action words that are used aa operational guides In the construction 
01 the instructional objectives are 

sulms (by pmntmg to touching or pick 
ZT upon being 

anmalsZ nh w vnth a set ol smalt 

to o, touching the „og „ me child Is asked to pick up the red Wangle 



Preparing Objectives 


when presented with a set of paper cutouts representing different shapes, 
he IS expected to pick up the red triangle This class of performance 
also includes identifying object properties (such as rough, smooth, 
straight, curved) and, in addition, kinds of changes such as an increase 
or decrease in size 

(2) Distinguishing (slmtlanlies and/or diflerences) Distinguish between 
objects or events which are potentially conlusable (square, rectangle), or 
when two contrasting identilications (such as right, left) are involved 

(3) Constructing Generating a construction or drawing which Wentifies a 
desionated objLt or set ol conditions Example Beginning with a line 
SmuntL request is made, '■Complete this figure sc that it represents 

a triangle " 

block " 

. two or more objects or events in proper order in 

i^c’coXfeUT^'erat^ry Ecr'example "Arrange these moving 
objects in order of their speeds 

• «« and naming all of the necessary categories of 

(6) Describing Generating properties, that are relevant to the 

objects, object ' ,ea„on Example ' Describe this object,' 

description ol a , .pe categories which may be generated by 

and the observer does . Describe the color and shape ol 

mentioning them, “ * rf' criotion'is considered sulliciently complete 
this object " The = fp ^pximately one that any other individual 
when there is a probab W PH 

Is able to use it to identity the ouj 

„,hnl statement (not necessarily in technical terms) 

(7) Stating Makes a ver Including the names ol the proper 

that conveys a rule or order Example 'What Is the 

classes ol objects or events n ^ ppppp.^pje .espcnse 

test lor determining whe he P, p straightedge. In various direc- 

requires the 'hinc ah along the edge lor each position 

tions, to determine toucning 

<„,mino the operations necessary to the application 

(8) Demcnstrating p|p ..show how you would tell whether this 

ol a rule or principle Examp 



30 Constructing Objectives 


surface is flat” The individual must use a straightedge to determine 
flatness by touching of the edge to the surface at all points 

(9) Explaining. The child should be able to take two or more pieces of 
data and describe relationships between or among them For example, he 
may describe the relationship between a pencil, a known object, and a 
pen, an object new to him 


* Adapted with permission from the A AA S Copyright 1965 by The Ameri- 
can Association for the Advancement of Science 


By their very nature, all of our instructional objectives will 
not be measurable When unmeasurable objectives are important, 
they should be retained. However, where possible, objectives 
should be m observable terms (particularly if they are going to be 



Preparing Objectives 31 


of any assistance in measurement) even though they are based on 
or derived from more general goals like "understanding." 

Let us consider the action portion of objectives starting out 
with more general goals. Suppose a chemistry teacher would like 
students to understand Boyle’s Law. One manifestation of under- 
standing Boyle’s Law should be the ability to use it. Hence, one 
objective may be students' ability to demonstrate a procedure for 
calculating the temperature (or pressure) of a gas. Suppose an ele- 
mentary school teacher wanted his or her students to understand 
the transitive property of numbers and mathematical sentences. 
Students who know a procedure should be able to demonstrate it; 
hence, students should be able to demonstrate a procedure for 
writing a mathematical sentence that illustrates the transitive 
property. The third illustration takes us to music appreciation 
where the music teacher wants students to appreciate Beethoven’s 
Fifth Symphony. Perhaps the music teacher would agree that stu- 
dents who have gained the desired appreciation will describe their 
feelings totvard the symphony in positive terms (presumably in 
order to give the teacher feedback as to their appreciation or lack 
of it). The American history teacher is concerned that students be 
aware of the causes of the War between the Slates. Since the his- 
tory teacher is likely to measure this awareness by giving the stu- 
dents a list of statements and asking them to check the ones that 
represent causes of the War between the States, we might say that 
he or she expects them to be able to identify statements that repre- 
sent causes of the War between the States. {Note that this objec- 
tive, as many, can be made measurable by replacing the in-action 
verb ^"be aware of" — with a suitable action verb, in this case, 


"identify.") . j » 

The high school English teacher is teaching students to know 
how playwrights create characterizations. Those who know could 
be expected to be able to either demonstrate a process for arriving 
at a characterization or describe one character in writing. Mean- 
while the teacher of a new vocational program in data processing 
wants his or her students to understand the key punch machine. 
The teacher may choose to evidence whether they do or do not by 
their ability to (demonstrate a procedure for) operating a key 
punch machine. And finally. Langston Hughes’ admonition to dig 
all live would probably come out in "objective-talk" as being able 
to (demonstrate a procedure for) carrying out a conversation in at 
least two English "languages," (i.e., speak in at least two English 
languages). 



32 Constructing Objectives 


Conditions and 
Criteria 


lt=ssss= 

beha™rb= judged for 

of the o\“"' “T •>■= b^Sttn.ng 

the word ■g.ven ^^e Im.etem T '?“=”“*■ ’==8'"^ "'■'h 

the end of the objective followine tL'^"T‘^ typically appears at 
The Illustrations used loT® u 'totement 

objective in the preceding sectim ha^ ‘h° of an 

a statement of conditions and criteria and app'eTr btw' “ 

Strate a procedure for calculatine t *‘“^a"t oan demon 

™”S;rrT:,r:r.~ 

a-i and *-^^‘112' rafdem'"’'""' “t 

“fting a third sentence that wVn “ Pfoo'dare for 

Fifth Symphony and 
h>» or her feelmgs m posi ive terT ' oan deLbe 

WaVSa'^n't^ -P- 

^ designated as accepLbrb, mrSher ™oa% been 

P'ay^plotandtheme "'""8 “nd relating 

...... 

^cnption to include 



Preparing Objectives 33 


the character’s basic qualities as manifested in the play, his or 
her relation to other characters, and relation to the play's plot 
and theme 


Given a sheet of data in proper form, the student can demonstrate 
a procedure tor punching it into proper columns using a key 
punch and programmed control within ten minutes 


Given ten phrases in street talk.’ the student can state the equiv 
alent or translated phrases in standard written English being 
completely accurate (to the satisfaction of the English teacher) 
on nine out of ten 


Bven though 

viated form of action statemen , y^^^ statement 

m m‘b/m"^:ndl criteria by which the action is to be 
evaluated 


The objectives that represenMhe end msuU^of - mstmchona. 

unit are often called terminal objectives 

school year, a “^“her tn y tLcher’s testing program will reflect 
m a subject matter area these objectives have been met 

an attempt to determine w ^„abling objectives 

There is also another h-nd f obj^ti ^ 

that build upon one another li of enabling objectives 

tive Teachers who test to ^ position not only to esti 

during the course of each ‘ ^ diagnose sources of 

mate masteiy of terminal^obj^^U ^ 

failure where f“d“re ° diagnose or detect 

basis for diagnostic 

areas in which learn ® are comprised of both terminal 

Unit objectives, the ' ^ Course or curriculum objec 

objectives and “‘'“'?Lmmal objectives However because it is 
tives would be a set „{ of instruction begins and an 

often arbitrary where o^J^ objectives in one se^ent, like a 

other ends, what a . for a larger segment, like a course 

unit, become “ab mg^ ^ mathematics unit on solving equations 

withreulS'wn IS shown on the next page 


Unit Objectlv 



Constructing Objectives 


Unit Terminal Objective 

Unit Enabling Objectives 

an cnatton by addtng and subtract, ng temts to both 
Sttnp^^ an e,ua,.„„ by ™uU.p,y.„g and d.vtdtng both s.des by 

Clear an equation of fractions 

an equation by adding and subtracting numbers to both 

' numtr and d.v.d.ng both stdes by 

Suppj^produc. and quotient equivalents to products and quot.ents 

W=n„fy needed operations in order 
AWandsubtracttermsmsequence 

("™ber”and,e™r"“ tP sums and difference 

==r~z::::r»- 

=™P-lyfrac„o„a, express, OP3 

Factor 

Identify the era . 

“"‘■I-»=nquiva.ene~:r'° 

--»Paffna,.o„<.e.gnr ' 

‘’‘"P'-oductsw.u.aera 



Evaluating Objectives 35 


EVALUATING OBJECTIV 


How can you, the teacher, tell whether you have written the right 
objectives’ How can you tell that these are the outcomes you 
should be aiming at for your students’ Kriege (1971, p 142) sug 
gests that the teacher judge objectives against the following ten 
criteria 

Written in terms of student performance'^ 

Observable by one or more of the five senses’ 

Specific enough to be meaningful’ 

FaM m relation (. e relevant) to the major objective 6°^ 
MeasurMe m terms of n level of performance and i conditions 
under which the performance is to take place 
Sequenual m relation to pnor and subsequent objectives^ 
Relevant to the student s experience 
Attainable within the time penod allotted 



VCIU' -h 

vague, undefinable terms ,|,e reUnon of the objec 

Others of the above c situation (eg valid 

tive to the uctaian and Edwards (1971, p 22) sug 

sequential, attainable) ,n Figure 24 for validating an 

gest the three questions „ 

objective based on its P ,nva)id because it appears 

other objectives An obj because the proper prereq 

too late or too soon m tne s q ^ ^ 

uisites fail to precede it wti 3ppe3r 

objectives we need or these decisions on a logical 

in the sequence seeing how students perform on a 

basis, we can make t e students uniformly fail an objee 

test based on the objenUves^^ „„.f<,n„ly pass the °ne it presurn 
live (like B m . j Figure 2 4), then objective B has little 

into (hke A i 8 j ^ often means that 


live ^ 

ably feeds into ( i ' after passing u - 

validity Failure to pass g|s/that instruction on A itself was 
an objective is ^ jata can be used not only to 'olidate 

insufficient) Performan d , , „ jo other objectiies but to 

an objective m terms of 



36 Constructing Objectives 



Figure 2 « Val/dal.ng an OWeclive In Terms of Its Place In the Sequence 

performance on given oblcclives (eg, B i C) contribute to 
performance on a subsequent obiective (e 9 1 A)7 

(2) Does performance on a given objective (e g , B) depend on or re- 
quire performattce on an objective that precedes it (eg r 0}> 

(3) Is performance on a given objective (eg . C) Independent of per- 
formance on 8 parallel objective (e g , B)? 

test the attainability of an objective withm an allotted period of 
time 

The remaining three criteria (relevance, challenge, and accept- 
ability) refer to the reaction by various audiences to the objectives 
These criteria are the most judgmental of the ten To apply them, 
the teacher would have to gain the reaction of experts, students, 
and parents through meetings and discussion groups as well as by 
questionnaire One of the positive features of objectives is that 
they make public scrutiny and public acceptability of the cumcu 
lum possible Indeed if instruction occurs without thought out 
objectives, this kind of public acceptability cannot occur 

All objectives can be evaluated as to form and structure En- 
abling objectives m particular can be evaluated m terms of their 
relation to one another Terminal objectives can be evaluated not 
only in their relation to other objectives but perhaps more impor- 
tantly by the reaction they provoke from vanous audiences Not 
only do students and parents represent potential sources of feed 
back-other teachers, subject matter experts, and curriculum 
designers as well can comment on the potential relevance, chal 
lenge, and acceptability m terms of students’ patterns of develop 






Evaluating Objectives 37 


merit and career goals The fact that objectives are tangible and 
visible (at least potentially so) makes them particularly suited to 
the process of public examination Their relation to one another 
and potential for sequencing make them particularly useful for 
instructional purposes, and their form and structure make them 
useful for test design since they help the teacher determine what 
performances to measure for and the conditions and criteria that 
will be applied in the measurement situation 


Box 21 


Arguments against Behavioral Objectives 

W James Popham one of the modem pioneers in the development of 

behavioral obiactivaa (and direo.or of the 

change al UCLA) probed the validity of elaven argomeota against 

loral oblectives (Popham 1968) Critics claim he argued 

1 easier to operationalize irivisl goals will be emphasized a. the 

2 ;?e%«;yln 79 ol^-ven.s .he feaoher from cepi.alizing on .he 

3 XTeS'ucaliOhal outcomes egual pupil behavior changes in impor 

_»^h.nistic hence dehumanizing 

4 measurability is behave is undemocratic 

5 planning how ",„ebers to slate measurable goals 

6 Its unrealistic to expect , measurable outcomes is diffi 

7 in certain subject areas iw » 

8 pmcise statements of educa.lona. goals would reveal them as innoc 

uous .^eniintabilily and is thus threatening 

9 measurability implies a , u ,as|< iq siale goals in measurable 

10 It IS a time consuming a 

11 Ssoilied goals blind evaluators to the important but unan.ic, 

gated oulcom popham proceeds to argue against each 

Are these arguments ' penality of certain outcomes apparent 
Operationalizing lends to cm ® „o, si,t|e sponaneity in teachers 

so that we can reject them P " business of the school is promoting 
Students or evaluators The p countering the 

desirable pupil behavior And so on iry, 
eleven arguments 



Constructing Objecthcs 


THE TVHAT AND WHY OF TAXONOMIES 

A taxonomy ix a device for classifying things in terras of «rta.n of 
&rSarLtenstics, thus, it tdcutfc the Ty 

thing to another m terms of these characteristics As is gnM^Ily 
hnoL, taxonomies exist for classifying plants and amroals and 
for classifying chemical elements mat tve “3 

here are taxonomies of educational objectives, that is, of the goa s 
of our educational system or parts of it Bloom (1956) suggest 
that such taxonomies will help teachers a define nebulous terms 
such as • understand” so that they can communicate curricular 
and evaluative infonnation among themselves, b identuy goa s 
that the> may want to include m their own cumculums. c identify 
directions m which they may want to extend their instructional 
activities, d plan learning experiences, and e prepare measuring 
devices While we are primarily concerned here with the last or 
Bloom's points U is unwise to isolate the measurement aspects of 
taxonomies from their other features 

Taxonomies are devices of human origin that not only help 
teachers to label objectives m terms of one or more of their prop 
erties but also to get some idea of the sequences m which objec 
tives may best occur, thus contributing to their validation * This 
latter feature is based on the fact that many taxonomies attempt 
to be hierarchical, that is. organiied into levels or ranks Let us 
illustrate these points with reference to a taxonomy of the cogni 
tive domain (Bloom, 1956, usually called Bloom’s Taxonomy) 
which IS summarized m Figure 2 5 


Cognitive Consider the elementary teacher who is interested in teaching 
Taxonomy his or her students how to research a social studies topic The 
ol^ective might start out vaguely as something like “knowing 
where to go to get information about a topic ’ In essence, the 
teacher is interested in having students acquire knowledge, more 
specifically, it is knowledge about ways and means of dealing with 
^ecifics (rather than knowledge about the specifics themselves) 
That js the teacher does not m this objective want the students to 
Icara something about a topic such as ‘deserts" but to learn ways 
and means of finding out about that or other topics 





The What and Why of Taxonomies 


Taxonomy of the Cognitive Domain * 

1 00 KNOWLEDGE 
1 10 of Specifics 

1 11 of terminology 
1 12 of specific facts 

1 20 of Ways and Means of Dealing with Specifics 
1 21 of conventions 
1 22 of trends and sequences 
1 23 of classifications and categories 
1 24 of criteria 
1 25 of methodology 

1 30 of the Universals and Abstractions in a Field 
1 31 of principles and generalizations 

1 32 of theories and structures 

2 00 COMPREHENSiON 
2 10 Translation 

2 20 Interpretation 
2 30 Extrapolation 

3 00 APPLICATION 


4 00 ANALYSIS 
4 10 of Elements 
4 20 of Relationships 
4 30 of Organizational 


Principles 


5 00 SYNTHESIS communicat.on 

5 10 ProdiiClion 0 a Un q 3^, operations 

5 20 Prodnction of a Pmn 

5 30 Derivation of a set oi 

6 00 EVALUATION Internal Evidence 

6 10 Judgments ,n Term o Interna 

6 20 Judgments in Terms of Externa 

. ..-Kav Company Inc From the book entitled 

pe;™ssSn®onhe publishers 


39 

Rgure 2 5 



40 ConstrucUng Objectives 


According to the taxonomy, knowledge of ways and means 
of dealing with specifics is of a higher order, that is, is more 
advanced or complex, than knowledge of specifics Most specifi 
cally, the objective in question — knowing where to go to get infor 
mation about a subject — falls into category 1 25 knowledge of 
methodology (the most advanced of the knowledge of ways and 
means categones) By referring to Figure 2 2 and looking under 
"Knowledge,” a teacher could find a set of action verbs that are 
useful for wnting "knowledge" objectives In this particular in- 
stance, the teacher could say, for example, that he or she wanted 
the students to be able to describe ways to get information about 
a social studies topic (e g , deserts) Given this objective, the means 
for measunng it are fairly obvious — although there are undoubt 
edly a number of different test items that could be written for 
eliciting the desired behavior 




The What and ^Vhy of Taxonomies 41 


Now that the teacher has dealt with the task of getting infer 
mation, he or she will probably become concerned with having 
students “understand ’ the information they collect This moves us 
into the second level of Bloom’s Taxonomy, “Comprehension Per 
haps the teacher's concern for comprehension will fall into the 
category "2 20 Interpretation " Using an action verb from the list 
m Figure 2 2, he or she may formulate the objective to explain 
why the hfe styles of desert dwellers throughout the world take 
a Similar jorm 

The teacher may feel that comprehension is not a sufficient 
place to stop and may go on to "Application ’ He or she may want 
students to produce a mode! of a dwelling that they could use if 
they Mere going to spend their summer m the desert From here 

the feelings of the desert dwehers F y. J 

nation," f,, they Uke about desert life 

about their own life helpful m developing tests to deter 

Bloom's sblls It can also help teachers 

mine students ® goals and into the relationship be 

gam more insight '"‘“ ' tionai activities Perhaps most impor 
tween their goals and >ns teachers to better identify the level 
tantly, the taxonomy en increasing levels of 

of their activities so tha ^ obiectives to the levels of knowl 
complexity Rather than i encouraged by the taxon 

It mtmnX" n into application analysis synthesis, and 
evaluation 

f educational objectives has been developed 
A second taxononw or enu . ^^^ans believing, 

for the affective ; ,,, 3 „ thinking perceiving or doing) 

emoting or feeling ^ illustrative version of this taxon 

by Krathwohl et at j „{ the categories in 

omy appears m Figure native has been added using the 

the affective ttmonomy. a ,,,3 cognitive domain 

■ desert life" motif ) Ume ^ identifies a sequence of 

the taxonomy of the m ^,j 3 ,atunng instructional experiences 
levels that may be useo 


Affective 

Taxonomy 



41 Constructing ObjectKes 


or developing test items, the latter bemg our concern ^ 

22 also proindes action verbs for preparing objectiveyn *6 
live domain, the procedure being the same as that outlined for th 
cognitive domain 


Rguio 2.6 An Wusiraled Taxonomy of the Affective Domain * 

10 RECSIVING (ATtENDlNG) 

1 1 Awareness 

Describe the aesthetic factors In the ciothing food and shelter that 
desert dwellers use to satisfy basic needs 
1 2 Willingness to Receive 

Identify books that have been read voluntarily about desert life 
Controlled or Selected Attention 

Reply to questions raised by teacher on aspects of desert life 


2 0 RESPONDING 
2.1 Acquiescence In Responding 
Present an assigned report on desert life 

2 2 Willingness (o Respond 

Respond with apparent Interest and zeal to assignments on desert 
life 

2 3 Satisfaction In Response 
Report pleasure In having studied peopte of the desert 

30 VALUING 
3 f Acceptance of a Value 

Recognize that children In alt cultures have similar basic needs 
3 2 Preference for a Value 

Demonstrate a desire to study and understand people of different 
cultures 

3.3 Commitment 

Write a tetter to a desert child expressing recognition ot your 

cnmmnn nood* ® ^ 


<0 ORGANIZING 
* 1 Conceptualization of a Value 
Identify a conlmuum or hierarchy of basic human 
person must be able to satisfy 


needs that each 



Objectives, Taxonomies Testing, and Teacher 43 


4 2 Organization of a Value System 

Prepare a plan for satisfying one's own basic needs and helping 
others to satisfy theirs 

5 0 CHARACTERIZING BY A VALUE OR VALUE COMPLEX 
5 1 Generalized set 

Display tolerance of human behavior directed toward need satisfac 
tion 

5 2 Characterization 

Practice tolerance as part of an operational philosophy of life 


.Copyngh.©I964 ^heD^ 
mmmy oj Educalioml Obisclives m Rsprmted ™th 

?rPu§..hers IHusTanops were pr. 

\ided by Tuckman 


OBJECTIVES, TAXONOMIES, TESTING, AND TEACHE 


can perhaps be of this chapter The classroom ts a 

of the classroom at teacher and a group of students oper 

small system IS ,hat the students learn and grow 

ate The mission of the y ^ largely with the teacher 

The responsibility f®*' s that the teacher arranges the 

Thus, the essence ,„cluding h.s or her own 

conditions ^ increasingly learn and develop Mission 

behavior so that student ,^hen the goals of the mis 

attainment in any syste hiectives Rather than merely repre 
Sion are spelled ^XhnTthe wnting or choosing of 

sentmg an ™ ,^acher with a set of goals or targets 

objectives provides tli classroom Whether the goals 

toward which to aim i teachers have them Objec 

are formalized or not ro visible 

lives are merely a way t There are three primary rea 

Why does a ‘“cher g ^^eh student 

sons (1) to “ad development (this is the momlonne 

IS experiencing learni g identify those students who 

and cerlification ^nd in particular the area of their 

are not learning ® oslic funcHon). (3) to determine 

deficiency (this is tne oiug 







Objectives, Taxonomies Testing and Teacher 45 


whether instructional inputs are, in general, effective (the pro 
gram evaluation function) 

As we have seen, objectives help the teacher evaluate the 
appropriateness of his or her tests and test items Objectives also 
e p the teacher know what to test for, they form the basis for the 
evelopment of tests and test items Objectives represent a definite 
point of reference in the c/assroom system both for instruction 
and testing Since tests attempt to measure the attainment of the 
teacher s goals, and objectives are formal statements of goals then 
objectives tell a teacher what to measure The activities of the 
teacher with respect to goal setting teaching and testing are 
shown in Figure 2 7 

The primary purpose of classifying objectives is to gam addi 
tional insight about the levels of instruction and the relationship 
between instructional goals But classified objectives also facili 
tate the preparation of test items If one’s instructional goal is 
knowledge acquisition, measuring for comprehension or synthesis 
would be inappropriate and unfair If one's instructional goal is 
analysis, measuring simply for knowledge would be equally map 
propnate Not only do the taxonomies help you write items at the 
intended level (as will be seen m Part Two), they help you check 
on the appropriateness or validity of items for the intended pur 
pose (as will be seen m Part Three) 


Additional Information Sources 

Armstrong R J et al Development and evaluation of behavioral objec 
AviTJ- r.barJ«;.A. Jones Publishing 1970 

Gerhard M Effective teaching strategies with the behavioral outcomes 
approach Nyack NY Parker Publishing 1971 

Johnson R A Kast F E and Rosenzweig J E The Theory and man 
agement of systems, 2nd ed New York McGrawHiU 1967 

Kibler R J Barker I & Miles D Behavioral objecttxes and mstriic 
tton Boston Allyn & Bacon 1970 

Mager R F Preparing instructional objectives Palo Alto Calif 
Fearon Publishers 1962 

McAshen H H Writing behavioral objectnes NY Harper i Rou 
1970 

Vargas J Writing worthwhile beha\ total objecUxes NY Harper & 
Row, 1972 



Constructing Objccthes 


SeJMesl ot Proficiency 

(1) Dellne an ob)ect«e and give three reasons why the classroom 
system needs objectives 

(2) ThmV ol a kitchen as a system List four goals or objectives o1 that 
system 

(3) Stale two activities of teachers lor which objectives can serve as a 
guide 

(4) If you were a student what are two ways that knowledge of the 
teacher s objectives would be of help? 

(5j Which one of the (oUowmg Is part of an objective’ 
a knowledge to be understood 
b appreciation to be felt 
c action or behavior to be performed 
d awareness or sensitivity to be developed 

(6) The conditions ol an objective represent the standards by which 
performance on the objective is to be judged 
TRUE FALSE 

{1} Write a lull (three part) objective tor the goal to know the hcBthn 
of (he longest river system in the Uniled Stales 

(8) Write a full objective for the gosl (o be able (o add (wo 3 drgrf 
numbers 

(9) Vlhich of the following is not a criterion of a good objective? 
a measurable 

b reliable 
c specif c 
d challenging 

(10) In evaluating an objective explain what is meant by the criterion 

sequential and how this criterion would be applied 

(t1) Inlo which one ol the categories ol the Taxonomy ol the Cognitive 
Oomam would the lollowing activ ly be best classitied producing a 
unique plan for cnf/sf/ng community supporH 
B comprehension 
b application 
c analysis 
<3 synthesis 
0 evaluation 



Self test of Proficiency 47 


(12) Which one of the following objectives would best be classified in 
the “valuing” category of the Taxonomy of the Affective Domain’ 
a Prepare an outline for helping other people meet their needs 
b Name five books that deaf with characters who are emotionally 
supportive of one another. 

c Demonstrate by your actions the desire fo help other people 
d List your ten best friends tn the order of their importance to you 



chapter three /Basing 
Test Items on Objectives 


OBJECTIVES 1 Identity an appropriate te$t item for measunng 
a given objective 

2 Identify three areas into which objectives can be 
usefully classified, namely knowledge and com 
prehension, higher cognitive processes, and the 
aiTcctnc domain 

3 Classify given lest items into each of the three 
areas 

A Prepare objectives m shorthand form (that is, 
in action part only) 

5 Prepare a content oullme for a given topic or 
objective as the first step m test construction 



THE RELATIONSHIP BETWEEN OBJECTIVES AND i 


It has been emphasized before, and will be again throughout this 
section, that tests are constructed to measure whether objectives 
have been met A test is defined as a sample of student perform 
ance on items that have been designed to measure preselected 
objectives Even if objectives are not explicitly stated tests still 
measure the performance of students In order to insure that your 
tests and the performances they require are related to (i e . meas 
ure) the objectives you want them to measure, it is important to 
state them objectively, if only in "shorthand form (that is action 
portion only) The objectives once stated, will help you deter 
mine the test items you need to construe, to measure 
terv of those things you intend for students to master The use of 
lb7ect.ves m mstfuction or m communication of aims are also 

''"'’Tu"I?eToT«ady io'begin kaXglw to construe, your 

own tests most of ™J^^4“ln‘*th°sS^ that 

mastery of the (I) preparation of objec 

test construction ‘noludes t P evaluation of test 

tives, (2) preparation of test 

Items Each step is J .j' Uustrate the relationship between 

pose of this chapter IS m 0) ,ests, (2) 

srE se: s “ ;s 

riii® £ — . 1.™ ■■ 

a content outline 

math m the seventh grade She had just 
• Miss Hart teach ^ jjmjems a test to 

finished a unit on se« n" , she sat down and 

see how much ‘hcT ^ ^ad been trying to teach them She went 
thought about what sh following objectives 

over her l=-on P - ^ ot a sel, 

(1) Given a statem represent the statement 

use form use set builder notation to repre 

(2) Given a set m r 
sent the set 

. pedagogy, of course had Miss Hart nr, lien her 
r It would have b«n “',„on rathe" than after 
objectives before mb 


Some 

Illustrations 
of Objectives 
and Tests 


49 



50 Basing Test Items on Objectives 


(3) Given terns used to descnbe sets and their relationship that 
IS equal empty sub and proper identify the correct dettni 
tionofeach 

(4) Given two or more sets identify those that are equal 

(5) Given a set and the empty set identify their relationship and 
distinguish between them 

(6) Given two or more sets identify those that are subsets of and 
those that are proper subsets of one of the given sets 

(7) Gwen a set state the number of elements in it 


After further consideration she decided a that objectives 12 6 
and 7 were equally important and that they were more important 
than the others h that objectives 3 and 4 were of equal but inter 
mediate importance and c that objective 5 was least important 
The greater importance of objectives 12 6 and 7 is based on their 
greater complexity and the fact that they form the basis for sub 
sequent performance She then decided on ten exercises with a 
total value of twenty-eight points five points to measure each of 
the objectives 12 6 and 7 three points to measure objectives 3 
and 4 respectively and two to measure objective 5 Her next task 
was to write the ten exercises Because the objectives gave her a 
strong clue as to what each item should be like writing the test 
was not difRcuU Miss Hart s lest is shown in Figure 3 1 


Figure 3 1 Miss Harts Seventh Grade Math Test * 


(1) Use malhemat cal symbols to md cat© tb© follow ng 
s C Is equal to the set whose elements are 5 10 and 15 
b 5 Is a member ot set C 
c 7 Is not an element ot set C 

a The set ot all elements x such that x ts an even integer 
e The set 01 all elements y such that y Is an odd number greater 
than 7 


(2) Use set builder notation to Ind cata each ot the 
8 112345678910) 
b (135791 
c U681012) 
d (ebcdefg) 
e Istuvwxyz) 


(Object ve 1 5 points) 
following sets 


(Objective 2 5 points) 



The Relationship belween Objectives and Tests SI 


(3) Connect each term at the left with the correct definition at the right 

a equal sets i every element of the set is also an element 

b the empty set of the other set 

c subset If gygjy element of the set is also an element 

d proper subset of the other set and vice versa 

III every element of the set is also an element 
of the other set but the reverse is not true 

Iv the set contains no elements 

(Objective 3 3 points) 

(4) Which of the sets listed below are equal’ 

A-11.3.5.7} 

B-12.4,6.8) , 

C- Ix/x Is an odd number and x is less than 9) 

□ -15,3,7,1} (Objectlve4 iVzpoints) 

(5) Which of the sets listed below are equal? 

A-{1.2,3.4) 

C-{y/y Is an even number and y is less than 9) 

D-{2,4,6,8} (Objective 4 iVa points) 

(6) Q,venA-(abc,d)ard.heempVsBt* 

a !s every element in <#> in A r 

b Is every element in A in ^ 

c Is ^ a proper subset of A (Objective 5 1 point) 


) Explain the difference between^ and (Objectives i point) 

, Wh, chonhe,o„ow,n9,e.s are subsets a b,7.A„ 

a (=n) ' It, 7) 

“ h ii,b,7,a) 

d {a,b,7,A} , 

g ^ ‘ ' (Objective 6 3 points) 

4 am n are proper subsets of the given set? 

9) Which of the sets m item [Objective 6 2 points) 



Basing Test Items on Objectives 


(10) Slate ^e „™.er o, « each o, the , Cowing sets. 

b {a,D.0} e / 

c (□.i) 


, iwojective 7; 5 points) 

Modern 

court Brace Jovanovich Inc ™ ^ranam. Copyright © 1970 by Har- 


time ,s spent ,n teaching rradi'nl A °f her 

■a administered once a year to^assesf 

performance, but order to nrovid “vcrall reading 

reading performance she finds if nee ^ continuous monitoring of 

t^wo weeks. Before beginning a unlf iu °"ee every 

hen..o„ she listed fo^her o^ “ comprZ 

accomplishment. ® following aims for student 

(1) Given a picture nam 

(2) °“en 

Answers " — — 

[1] a c-{s_ioigj^ b ser 

t2) a (x/?fraS?f ^'^>""7)*' '3 an even Integer! r , 

^ fx/x Is a letter 9reater than 2 an.! t 

(S) C-D 

I?) I r*‘ * yes 



The Relatfonshlp between Objectives and Tests 53 


Although the five objectives were only a portion of Mrs Morns’ 
goals, she felt that they were representative of the reading com 
prehension skills that her students should possess At the comple 
tion of the unit she made up a short test based on the five objec 
tives It IS shown m Figure 3 2 


Mrs Morns" First Grade Reading Comprehension Test * Figure 3.2 



Tell me whom and what you see in this picture Tell me what is happen- 



nicture 4 comes after pictures 1, 2, and 3 and not 
Tell me the reason that picture i 

betore them or In the middle {objective 2 ) 



Basing Test Items on Objectives 


(3) You have read three stones Angus and the Ducks, Blueberries tor 
Sal, and Michael Who Missed His Train In each story there were 
animals Tell me three ways that the animals in each story wore like 
one another 




(4) You have read Ihe slory, city Streels and Cduniry Roads, about Hie In 
e country and life in the city Tell me three ways that llle In Ihe city 
IS different from life In Ihe country 

(objective 4} 

Jlie bears 'hat 


(objective 5) 

•Adapted from Instructional Objecttves Exchange Reading K-3 


a course 

course a^e shown objectives for the first half of the 

f hteran!T"'"*'’'‘“'^'"“ 

t figurative level whanhr™ descrtbcs, 

sage IS— using the Do<»m v Poem s underlying mes 

e petaonal level “nfi 

«> n Identify the tone of a P"™ 

^ J'« natter and audten^, and ^ 

tone IS revealed'"’" ’’Y which the poem’s 

<> describe m a short essay th “^sunations, and 

“"d effect of these words 


sitting on chairs eating fro 



The Relationship between Objectives and Tests 55 


(4) Demonstrate an understanding of the rdattonship between 
figurative language and meaning in poetry. 

a. identify figures of speech m poetry, 

b describe the feeiings and ideas contained in each, and 
c. describe in a short essay their importance to the meaning 


(5) 


of the poem. ± . 

Demonstrate an understanding of the iuncuon of repet, twe 


so, nil! in poetry (meter or rhyme)' 
a identify regular and irregular patterns 


of meter and rhyme 


in a poem, and 

b describe in a short essay the contributions of the regular- 
■ ities and irreguiarities to the poem's meaning 




56 Basing Test Items on Objectives 


Mr. Emerson’s exam is shown in Figure 3.3. 


Figure 3.3 


Mr. Emersort’s Eleventh Grade Poetry lEngllsh) Exam.' 

Smner by A E. Housman; that la, deacr.be Iha slo^^ that the poem 

' o;'h'lb“; a psTsrS ■>' experienced 

"the color cl hia/her halr'^or'skln"”"^"’'* 

(=)Wha,,s,be,„„eoMbepoe...|„d , ... 

the tone revealed? n Just— by e e. cummIngs? How is 

( 3 ) Daei* k u (objective 2} 

Chty" hVedwarAto'gtorR^w^^^^ ''“t" "Rlehard 

(unction in the poem? ” common. What Is their 

crown (line 3) , 

9htlered„™a, 'rg ^r^’ arrayed (line 5) 

(t) Name Ibrea f,g„,es cl . Wecllvea) 

h'r’es -dederctnl;:^ byTaTh^M^rse 

(SI Is the meler cl the pcem "File t 





The Relationship between Objectives and Tests 57 


(1) The student will be able to define and describe each of the 
following processes in terms of a when and b in what exact 
manner it occurs 

] mitosis 

2 meiosis (sexual reproduction) 

3 asexual reproduction 

4 natural selection 

5 Mendehan heredity 

(2) The student will be able to describe and explain 1’“"' of 
the above six processes in its otvn way maintains the con 
linuity of hfe ihat is makes it possible for certain forms of 

(3) ThVs”:vill be able to contras, tbe above six processes in 
terms of their 

a simplicity-complexity 

b dependence on the environment 

d ?r=dm.ab,hjy "fo^Tard" 

(4, TheTtud“ rable to describe and present m an essay 

b the continuity of class lite 

d the test shown .n Figure 3 4 below to 
Mrs Dorfman learned the matter covered 

determine whether the students had learn 

in the unit 

Mrs Oorfmans H„b Soboo, Fiofopy Tes, on fleprodoohon • 

«n/i/ and no words the process of 

(1 ) Show using pictures only 

a mitosis 

b meiosis (objective 1) 

„ hall point pen How do you think you mlghl 

(2) Imagine that you were „g| feproduclion? 

go through the process ol asexual rep „ 

^ „ arcording to Mendehan heredity 


Rgure 



ests Measure 
Objectives, 
Objectives 
icihtate Test 
Construction 


Basing Test Items on Objectives 


ss!ecr„!!;lr "'“"S Explain how natural 

selechon might have operated to put you out of exlaienoo 

(objectivo 1.2) 

-n (that la, InihTy Sr 

(objectives) 

"" «t:rrr. :rr "r 

Should auggest how you feet about this diflerenoe'''’^ '’°™ 

•-P.ed trum (eh, active X, 

SY^'^StnemataSttn"® rafl“ 7 ''' b«lt 

?n a? ' “.'f®"" °f objectives , a In objectives and 

faction If one were to asHn^ ? ™Portant asset in lest con- 
om “ns'dered tLir tesL'm •’'e illustra- 

sL*""‘«'onsh,p between their oW^'“''‘^"' they would point 
Some teachers attempt to link voal ^ ‘heir test liems 

-SE=S=t:i^aSv' 

2 ttlustral™; hatefe'fuh objectwef th'° “"‘‘i 

- -Crec^; -nyt -Sf , 

The core of the oh ^^rectness can h ^^”2 operation 
"fy on elepZfe- '.yhe behavi“"T^^,^S'«ectiveIy ) 
tmmng four or more aT'“! Pf°hably g,ve a st, H hohavior, "iden 
Was an elephant sm snd ask him or h ^ Picture con 

the action verb " ' Pointing to" ,0 ,1, ’° ‘Odicate which 

■J^eussed .n 1 " The®m °han ics ““ed fe W 

•he objective accomrf"!!”® ehapters, butfte "'"ting will be 
{“‘’her what or s£.‘*“ *"0 purposes (1^'"' *31 

f“eusonit)and(2),, ’ "'f'omeosuref ir reminds the 

■' (provided that the /T''" ^^e hint or her to 

objective in tl,e has buift such fn?" ‘° "'eexiire 

“'=jeetive, the more Naturalty“ 5 ,‘"f°™et.on into the 

"formation 1, Provides forThJ-h'’'^*"'^'' *= 

"e how to mcas 



The Relationship between Objectives and Tests 59 


ure question Look closely at the objectives and corresponding 
Items in the preceding examples and you will begin to appreciate 
the extent to which objectives facilitate item writing 

The preceding examples are also intended to illustrate tyoes 
of tests and test questions that involve paper and pencil to answer 
Paper and pencil is the test medium in which most tests are writ 
ten (The medium of performance testing will be covered in a sub 
sequent chapter ) We can distinguish between short answer ques 
tions and essay questions, each representing a different 
answer format Finally, we can roughly distinguish among (1) the 
measurement of knowledge acquisition and comprehension (2) 
the measurement of the higher mental processes-application, 
analysis, synthesis evaluation and (3) the ^ 

tiveLtcomes such as attitudes to use the terms from the taxon 
omies described in Chapter 2 (pages ) 


— Box 3 1 


CRITERIA APPLIED TO OBJECTIVES 

,i,n„ in the Harvard Business Review (1964) 
Charles Granger writing m obieclive in the business 

cited the following six criteria to be applied to an o i 

context 

1 Is It, generally speaking ® the teacher use it as a 

in educational terms we might asK ao 

basis for instruction^ , certain types at action? 

2 Is It explicit f teacher know bow to proceed In order 

Educationally does It help tne re 

to facilitate Us achievement r effectiveness? 

3 Is It suggestive at tools I m construct tests for meas 

For the teacher dees t help him ^rhls is the sub 

uring the attainment of his or 

jeot of this chapter) ehatlenglng? 

4 Is It ambitious enough “ motivated to attain it? 

In the classroom are a>“den' Internet oonstrelnls? 

5 Does It suggest , earner take into account his or 

In constructing it d°aa ^ capacities of the students? 
instructional , be broader end Iba morn spacllic oblac 

6 Can It be related to ^ organization? 

lives at higher and me sequence ol ob)ectlves? 

Educationally does it tit mw 



.s 


Knowledge 
Acquisition and 
Comprehension 


'"Shtrcognlilve 

Processes 


knowTeTge™^smra'^Md measuring 

s.n,c,ed'forZs Xs“mSr,r?” 

have acquired information and ‘J'^gme to which students 

you reexamme Bloom’l TmoI™ It means If 

will see that the fot two levds ^-ou 

prehension These two levels refer t^fh ^ and "com 

tion, and recall of factual infor™!f ic mcorpora 

edge and comprehension of fecCTrema *’' acquisition of knowl- 

considerable effort has been com^b education and 

Two of the four tests presented ^ S measurement 
mng of this chapter measure Tn„'‘? at the begin 

ure“h T “ ^‘■mg and math ^ i “mprehension 

we higher cognitive processeT '^gely meas 

Sme exam '“'’^“’'''''5' ' " 

'o* aTthi?^™"Se from^Th^^^^^^ 
measure wha?,b“i'"«ma such al fcf measur 

-•-.onwiUdet"Sd:«^^^ 

there are tmpor 

cognitive processc^r H® 'mdcrstandmvT “ °f men 
edgeandhavebee”fZK''’'>“®''<otast®t:mf ’’ighar 

evaluation bv Bl '“‘’eled as apphcat,„„ . * °c “smg know! 

S mean 

‘he essay type) 



Areas of Objectives 61 


Short-answer Items That Measure Knowledge Acquisition (Recall) 
and Comprehension * 


(1) About what proportion of the population of the United States is living 
on farms? 

a 5% b 15% c 35% d 50% e 60% 

(2) The primary germ layer, from which the skeleton and muscles de 
velop, is known as the 

a ectoderm d endoderm 

b neurocoele e mesoderm 

c epithelium . ... 

(3) According to Daniel Webster, that which Is nrcst inseparable from 

“union" Is 

a “country " c ‘ the North 

b “llbertv" d “welfare* 

(4) h the vcLe c. a given mass cl gas ,s kept constant, the pressure 

may be diminished by 

a reducing the tempera ure easing the density 

b raising the temperature e » 

c adding the heat . . ,epresents the demand schedule of 

rc'^^m^'!’; ?;rwMchir:s a p^rtecly inelastic demand, 











D 



D 

41) 

1 


w 


c 

1 

D D 

£ 











Quantity 


Quantity 


Quantity 


u M=I be living at this hour England hath need of 

(6) ■ Miltonl thou f"""’ , „e,ers”— Wordsworth 

thee, she is a fen of st g , „e,ers," indicates that Words- 

The metaphor ‘ she is a 

worth felt that England was 

d in a generally cPmtipt disease producing bacteria 

(7) A scientist ,ad bacteria free material referred to as sub- 

From them, he exm X was then injected into each 

stance X A large 


Figure 3.5 



Quantity 






62 Basing Test Items on Objectives 


animal of group A These animals promptly developed some of the 
symptoms normally produced by Infection by the bacteria In question 
Then, into each animal of group B, the scientist made a series ol 
inieotions of smalt doses of substance X Animals in a third group, C, 
received no iniections Three weeks alter this series ol Inlections, 
and continuing for two years thereafter, group B could be made to 
develop the disease by iniecling them with several thousand times the 
the anln , “ ° Substance X acted upon 

the animals Of group A as If jt were a 

a poison 

b destroyer of poison 

0 stimulator of destroyer of poison 

o-brcrx^rar a“r" -- -- 

a a^means cl counteracting the efiects ol the disease-producing 

‘ bacteria or of their 

° product Ol the bacteria 

Bloom from B S 

jeciives Cognitive Domain, 1956 

higher cognitive aras^hMm the ™P‘'°'' 0 “orit m these 

oulty in measuring is probahly whaTh' ‘*‘"® S^'^oter diffi- 

dogree of attention paid 1^11"* the lesser 

has manifested itself i„ ,be use of ^ P"™ “teas This neglect 

ent failing of the cognitive domn^^ linuted use of 

in meatittr that has bepn mher 

'heir test-cons. ® ““ “f the th.nkin P™™pted by the diffi 

«-nst.uctie„ skills, .eacLer?”1,'’b™“- ttiPtoving 

“ “hie to feel more 

Answers 



Areas of Objectives 63 


comfortable in aiming some objectives at the attainment of think 
mg skills hence increasing the application of instruction to the 
development of thinking The improvement of thinking may be 
ultimately facilitated rather than inhibited by the science and art 
of testing 

Sample paper and pencil test Hems for the higher cognitive 
processes appear in Figure 3 6 However it must be pointed out 
that thinking skills need not necessarily be measured only by a 
paper and pencil test They are often measured accurately by a 
test requiring an active observable performance on the students 
part {Performance tests will be descnbed in Chapter 7 ) For illus 

trationhere written items have been used ^ 0 “ 'hat the first item 

in Figure 3 6 requires that the student apply what he or she has 
learnS in algebra and geometry to the solution of a real problem 
The second involv es the analysts of a Pf f '"‘°3 

arguments that do and do nor -PP-‘ ■' ^^h^Vom deTe'^^ into 

the student The fourth calls for the evah.a 

the drawing of a suitable P ^ „,a, ,he student 

lion of clothing matenals in terms ot criier 

must have alreadv oriented more to the process 

The higher mental PJ^ obtained than to the answer or 

by which an answer or solut measure outcomes or 

solution Itself Consequently 

responses based on “’“f P jsaniy completely right or wrong 
score since answers are no obtained 

their degree ot correctnes access to the solution process 

The teacher behind it m addition to the solution 

or the logic or „atenstic that accounts for the rarity 

Itself Perhaps it this attention be paid not only to 

of such tests and which req criterion writing as well 

test Item writing but to answer key 


Items That Measara Thmktng SMs' 


Items max 

liar lot exceeds its breadth by 20 yards If 
(1) The length of a . bv 20 yards the area of the lot will be 

each d mension is tncre ^ of the or ginal lot Show your 

doubled Find the s 

work below ^ e none of the above 

b . iZl the PtesWenI ot the Ua,ted Stales sdoM 

,2) Resolved T/iat Itia term 
be extended to six y 


Rgure 3 6 



M Basing Test Items on Objectives 


The Affective 
Domain 


Mark each statement (a-e) below 

A— If you feel that it could be meaningfully used by the affirmative side 
in a debate on the resolution 

N-if you feel that It could be meanlnglully used by the negative side 
(Norr yIu '’“t ="15 of Ihe argument 

.rofthl staleir 

a Efficiency Increases wilh experience 

loun°de'd"ih*° “P®" "hich the United Stales was 

® p:;:=sr d^d* - - — 

Presid^Zstor^ierwish " PPtlsfactory 

'PlPPa such as coSenceTrf w^^h 

used lor evaluating the chosel “‘"“‘’""y ’hat can be 

Jane considered Indicate how your cr'u 
[f lead you to make ih^ 

■Pdcilic es possible ,o stating and applying' entena°"' '' 

tnomB S 

/wnves Cognitive Domain 1956 

fion selfKionr ♦ ‘^®''®lopinent of awar education to 

"“P* “•'rfoct.on aspiraC motiva 

P ration tolerance interest value 


Answers 

llK f2) a A 6 N c X rf A 
«) drawing should r..n ® ^ e x 

fastness crease resis 



Areas of Objectnes 



TKe reason yoo na.e ^'.00 r 

consider this place to be a think tank we oon 

_j lilkC but seldom attempt to 
clarification, ,i,er such affectne outcomes are 

systematically determine w ^ jy jg 

being achieved When teach ^ manner by considenng their 

ments, they often proceed systematic measurement 

opinions of students owe shown by the illustrative 

within the affective domain is p 

Items m Figure 3 7 affective items do not have right 

Unlike roost cognitive „„ 3 Itey ,hat indicates the 

or wrong answers Scoring i o„entation chosen by you as 

direction of the item ^mm in Figure 3 7 to reflect 

desirable If one were ^ nil four statements would be 

interest in engineering, for “a interest, all four 

keyed to the njte response 

statements are keyed to 33I strategies are aimed at the 

Many “ntemporaiy ed program developers must careful y 
affective domain Teachers adequate ways to 

consider their affective objecU^^ 

measure these objective emstmg ones modified Such test 

tests will have to . Jsenbed m later chapters, calls for the 

development, as will be ,g,Pg5 35 opinions, attitudes, 

fndvrrforThteb absolute right and wrong answers do n 




66 Basing Test Items on Objectt^e* 


”5.s ”srri“.- ,»»« »"»•■* 

lorted responses 


3 7 Hems That Measure the Alleclive Domain ‘ 


(1) As VOU read each hem below, undetUne orra at three letters 
L if you woufd /Ike to do what the item says 

M il you neither like nof disKke what the Hem says but you would etm 
be willing to do it 

D It you disilke whal the item says and would not want to do W 

L N D a Sing songs at parties 

LND b Smg m a gtea club, chorus or chotr 

L N 0 c Play tn an orchestra or band 

LND d Make up tunes to hum, or compose music 


(2) You read short stones or novels (other than for school) 
a t never (if you choose this answer omit Hems b and c) 

2 occasionally 

3 frequently 

b 1 with little or no enjoyment 

2 with a fair amount of enjoyment 

3 with great pleasure 
c 1 just for the story 

2 paying some atUention to plot and characterization 

3 making a detailed examination of the idea and structure of tl 
work 


(3) The following statements represent opinions about various phases 
school lite Bince there are no right or wrong answers you are 
express your own point of view about the statement Underline 
A If you agree with the whole statement 

U If you are uncertain how you feel about the whofa statement 
0 If you disagree with the whole statement 

A U D a It IS better for seniors and freshmen to eat at different lun 
room tables 

A U D b Seniors and freshmen should not dance with each othe 
school darrces 

A U D c A capable freshman vniold mate just as good a sludent « 
cit president as a capable senior 

A U D d Sartor men and freshmen women should not have dates 
each other 



68 


Basing Test Items on Objectives 


for the subjects of reading and ^nts'^^rtement 

that each statement m the conten S Savior or action 

of an objective only in the form been left oft, 

The statement of the conditions and the criteria h brevity 

as they are typically in a content outline tor purposes ot M y 


Reading Content Outline 

(1) Recognizing the sound of final consonant digraphs 

(2) Recognizing the sound of initial consonant blends 

(3) Recognizing the sound of final consonant blends 

(4) Identifying vowels modified by r in words 

(5) Identifying vowel diphthongs m words 

(6) Classifying singular and possessive nouns 

(7) Classifying adjective endings 

(8) Classifying irregular verbs and verb endings present tense 

(9) Classifying compound words 

(10) Classifying contractions 

(U) Identifying synonyms and antonyms 

(12) Identifying personal pronouns 

(13) Identi^mg words using context and configuration clues 

(14) Restating sequence of details 

(15) Inferring mam idea 


Mathematics Content Outhne 

(1) Identifymgplace value (I s 10 s 100 s) 

(2) Adding columns of numbers without carrying 

(3) Recognizing applications of the associative principle 

(4) Subtracting two-digit numbers without borrowing 

(5) Recognizing applications of the distributive principle 

(6) Identifying common fractions 

(7) Identifying improper fractions 

(S) Identifymgplace value (1000s 10000s) 

(9) Idenli^mg decunal fraction values to (hxmdredths) 

(10) Reading and writing decimals 

(11) Convertmg decimals to fractions 

(12) Converting fractions to decimals 

(13) Adding two- and three digit numbers with regrouping 

(14) Stating multiplication facts 

( 15J DMermimng the product of Too digit numbers 


('“’’O descnbed m the following chap 

lIS (o?l,s 1 \ *= P^^P=«tion of a content out 

htie (or Us, of objectives m bnef form) that tells you what it is 



70 Basing Test Items on Objectives 


Sfeoufllv L T “ objectives 

per Obfectoe “r; '™6bt.ngs occur, the number of items 

mh=rword^arsr’ ■>’« "‘’j- 

have approximatelv .r “ "'Oighting of 3 should 

wetghSgT; “ "’“5' items as one with a 

aimed at insuring the appronr/^™"**'?'^! ^ “ procedure 

■>propriatenesf“ is S 'T °f 

‘'™o“rt" Appendix 3^"''”’'"“' 

are ready to be embodied intrt'‘"'*''‘'' ‘^“"^bions and criteria 

Two deai With the pr:ptfi.:rr„;rstfa:Le?timt':"^^ in Part 


Box 3.2 




^ ^ ruK SUCCESS 

I™""'"”""' OMeelives (1962), 

indieat? common ways of delm ®®cepfa6/e periormance. 

'"“'“""S °' acceptable performance are 

3 2 “ of c’liTrecr"''® 

^ *he minimum oercpniea '^®sponses, 

^ ™ «-ac, responses (e g , 

accuracy (eg, ,be 

-:«^Sr--^--fLomp,exobtec.,ve.ba.,s 

'"a student must be iihi„. 

~ with a 

demonstrate his o2h '** *wenly-four notes^Th* ®'^'®aa bars 

'“"-Pcsition Within 's Id comp, ate :s'o^t“''°P'""^ 



71 


Additional Information Sources 

Ammons, M Objectnes and outcomes In R L Ebel (Ed) Encyclo 
pedxa of Educational Research, Aih t6 NY Macmillan 1969 908- 
914 

Krathwohl, D R iPajme, D Defining and assessing educational objec 
ti\es In R L. Thorndike (Ed), Educational measurement 2nd 
ed , Wash , D C American Council on Education, 1971, Chap 2 

Mager, R F Measuring instructional intent Palo Alto Calif Fearon 
Publishers, 1973 

Tinkelman, S N Planning the objective test In R L Thorndike 
(Ed ), £d«catto«af metwiiremenr, 2nd ed . Wash DC Amencan 
Council on Education 1971 



72 Basing Test Items on Objectives 


SelMest of Proficiency 


reading test Items Match each one 
map-reading objective on the right 

/ Use a given material to create 
a representation of a specified 
area 

h Identify map symbols on an 
outline map 

Ui Use geographical directions 
to designate the route between 
given locations on a map 
iv Identify objects that corre- 
spond to symbols on a map 


(1) Listed on the left are three map 
with the shorthand form of the 
that It measures * 
a To go from the post office 
to the school (as shown 
on a map) would you go 
north then west or north 
then east? 

b On this map of the school- 
room point out the fire 
alarm piano and chalk- 
board 

c With a set of blocks 
construct a model of the 
street on which your 
school Is located and 
then position your school 
on It 

foll<.wmg„ems would 

= power ,he government ,3 oonst.tuted ,n the 

Power,' the jud.o.al 

FALSE 

Items v.,th Pn™e'rftLt°are '> Possible to prepare test 

a affective domain fight or wrong? 

J J'Sher cognitive processes 




graphy) K~9 



Self test of Proficiency 73 


(4) a In the classroom, tests are most often used to measure the area 

of higher cognitive processes 

TRUE FALSE 

b To score an answer to an item that measures evaluation, the 
teacher needs access to the logic behind the answer 
TRUE FALSE 

(5) Into which one of the areas ^knowledge acquisition, comprehension 
application, analysis, synthesis, or evaluation) would the test items 
a, b, and c in number 1 above be best classified^ 

(6) Into which of the areas listed in Item 3 above would you classify the 
following test item*’ 

strongly strongly 

I like this textbook very much agree agree disagree disagree 

(?) Write an objective m shorthand form for the goal to know who the 
United Sfafes senators are from your state 

(8) Write a shorthand version of the following objective Given a famil- 
iar melody, the student will create an aecompawment for the melody, 
such accompaniment having the quality and mood of the melody, 
using the appropriate instruments 

(9) Given the shorthand form objective change a flat lire on a car. pre- 
pare a content outline with at least three tasks 

(10) Given the shorthand form objective, define describe, and com- 
pare facism and democracy, prepare a content outline with at least 
three tasks 



part two/Constructing 
a Test: Teacher-built Tests 



chapter four /Short-answer 
Items to Measure Krtowlecige 
and Comprehension 


OBJECTIVES 1 Identify SIX types of short-answer Items unstruc- 
tured, completion, true-false, other two-choice, 
multiple choice, and matching 

2 Identify and state rules and recommendations 
associated wuh the construction of each of the 
SIX Item types 

3 Construct sample items for a given objective 
using each of the si* item types m accordance 
with the rules and recommendations 

4 Distinguish between the different characteristics 
or features of the six item types and the testing 
situations in which each of the item types is best 
used 



TYPES OF SHORT ANSWER ITE 


This chapter deals %Mth the measurement of 
and comprehension-areas of prima^r mterest 

teacher The teacher typically monitom by f^owl 

extent to which knowledge and the f 

edge hare been transmitted to studems ^he majordy^fbu. noj^al ) 

of a teachers objectives Cuch testing enables the 

becomes an important ^ ? progress diagnose their 

teacher to evaluate ^ fffrctiveness of the instruc 

weaknesses and get some idea of the eltectiv 

tion . , teacher has completed 

We will assume at this P°‘"‘ ' ‘ ‘t he or she has (I) 

the first three steps in test con ^ outline 

specified the goals (2) put e „hiectives In originally specify 

and (3) wrmen ‘h™ «Xhtr should also have decided which 
mg his or her goals j acquisition and comprehension 

objectives required kno 1 dge ^ „hich involved 

which requ>re‘*,*''‘'’''"\®„i,ich involved behavior change In this 

attitudes and values „ ,he construction of test items— 

and the subsequent three “tup ^o^ered This chapter in deal 
the next step m the P™"” and comprehension will focus 

mg with knowledge acquis ,ypa of items 

exclusively on short answer i processes 

most commonly used fu^ 'he "leasur 

Short answer items ha ^ ,s rhe unstructured for 

are basically f-'^.^^^Tco^l^k.ion forma. Fixed choice for 
mat the other the fil m multiple choice and 

mats include >™u-taUe mhe^ , terns have previously 

matching Both free c However in the free choice yp 

determined correct tespo^ select the correct 

the student is not given h I„ (his uhap'er 

response as he or she iS ^t^ described and illustrated 

each type of short answ oonstruction of each will be offered 
and some guidelines for ,|y students to identify distin 

Short answer ‘'en« 'J? ^ooh items may also particularly 

guish state or name someth Inthefree cho.ee 

m math ask them to dero ,„yoives asking students a 

format the measurement ^p^^ifjc mforma 

question that requires tha * Y indicating acquisi 

Ton or knowledge - f choice formats the measure 

tion of that kno^edge^ alternative 

ment basically J ^ 



78 Short answer Items 


%msh between correct and incorrect ones (that is to recoemze the 

“"“.ferr i’?' r?' “■> .aLn^^nd, 

.nformation or knLTdg^ cXd'for'mthfdem 

distinS" ™e 'tr'nti,?'''” ‘'‘n‘ “> ‘dentify 

knowledge acouisitioTi something and hence demonstrate 
ured by means of Ze 

forniatLhatwdfhrisltdrr 


unstructured format 


be answered by'*a'wo?d™Xts!‘’™^‘ utilizes a question that can 
given below O'- ’’“'"ber Some examples are 

Examples 

What phrase did HaraL hormone ACTH’ 

• Which state in the United Slams Yorick’ 

than any other* Produces more copper ore 

• H+H-/.0- 

•'""-otediepi.em Ode on a Grecian Urn * 

write fthf. to * '■^presents a reasonnTsi ® ^ multiple 

to present! ^ to thmV 

kind S ueL i, “ “n be accomm ‘f 1 °^"''™'“"'= uu™>=rs 

■uus, a "Uiap figure rSanhf'* ‘o ‘he 

~rihei:r; 

1 uny number of answers ®°™u unstruc 

-ay 


Answers 

' “ VI6 Ksats 



Completion Fonnat 79 


resemble the correct one to some degree Consider the item, 
"What chemical is often added to dnnking water to help prevent 
tooth decay’” Students may answer fluorine, fluoride, sodium 
fluoride, stannous fluoride, fluride, among others In ‘ 

teacher will have to decide what the student had in mind Mind 
reading adds a difficult dimension to item scoring 
The unstructured response format 
„reme„t of specific knowledge, most commonly in math, 

and history 

writing the Item I"i-vrr,Tf~ ^ 
the first point to keep m mind possible Reducing 

reel response will obviously simplifies the task of scor 

response possibilities, which o . J hrief and to the 

mg! IS often facilitated by e respo«rand only one will 

point, and of Scoring /further aided by snppfy 

“^r/eitXn rr/kW .dUe^ ’’y 

■"'"^he Items themselves ^HouM be — 
giiage possible so ‘b’? * ® the two examples below written 

become a task in itself , between a ratio scale and interval 
tor the objective dts»ngtmb bew 

scale in terms of then q distinguishes it 

• A ratio scale has a p feature’ — 

from an interval scale interval scale does not 

. What does a ratio scale have tn 

have’ ^ ^oro point" the sec 

Though the answer to >’°‘b > ,se first and more clearly 

ond Item is simpler of an unstructured response 

illustrates one major eba ^ question form rather than state 
Item, namely that it is 
ment form 


COMPLETION FORMAT 


e tc nlso a free choice format 

The completion or fiU their own response rather than 

in that the students must ^ the unstructured 

Si/ore from among g.ven^f;-f.„ „ complete a sentence from 
Item by has been omitted 

which a word or phrase n 



80 Short answer Items 


Examples 

• The man who discovered Florida while searching for the 

lountam of youth" was 

• Boyles Law states that pressure of a gas multiplied by its 

^is equal to a constant 

Give me liberty or give me was the pronounce- 

ment of a famous Amencan revolutionary 
Among^^^ones secreted by the pituitary gland, two 

• A fixed zero point is the characteristic that distinguishes a 

. ^eale from an interval scale 

;^^^j|^^produces more copper ore than any other state 

advamaJeTfn?rsadvanlaT‘’''’‘i,°'' °f ‘f"= 

while being somewhat more* '•'a onstructured response format 

must give !uffic™rt c|ueTs' ' ">= 

many to be unchallenging the wo"? ““'’’Soous but not so 
ticularly cntical Compfetion iteml"® ‘^“mplol'on items is par- 
easier to score than unstructured?- f’.''°P'''ly wntten are 

advantage of being quite ? ‘ho dm 

ever by requiring that the students °™ S''‘"”mar How 

hm type of itemVses a chine answers, 

structured items can both be inclLed" *1"" ^“™P’ohon and un 
udent a change of pace il^ to offer the 

■S to be tested ”"°"”'''’«e specific fact learning 

Writing the Item r 

^Jrtke a balance between havmp ‘terns, the key is to 

II “"d Item be 


— .1..* 

the evolutionary thoor, c 


"= leoa vole™ 

''as.pressm TSH 



Completion Format 81 


The first example suggests a variety of answem There *e ^nswe 
will depend on which theorist the student chooses for the first 
blank, thus making the item indefinite by inviting a »td”“ge of 
possible responses The third illustration 'J, ' f 

thinking processes ,l.„ ton manv nor too few clues 

;:m:v!rhts"tL'p1^ •» themdefimte article 

(a, an) 

. A subatomic particle having a negative unit charge an 
ligible mass is a (an) 

. r 1" tlie second illustration avoids indicating o 
By using a (an) the ^„ins with a vowel 

student that the answer, ® should have a single 

As a final point, compteto« '«^ that 

ntiswer, preferably a responses are much more diffi 

prompt students to give a ■'“Sf “ elicit the same correct 

cult to score than those items helpful to write an item 

response from many then rewrite it as a comple 

originally as an unstructur tWms are only useful in 

tion Item Remember that “ P ' knowledge Consider the fol 
measuring the acquisition of specn 

lowingexamples ^ on 

• Ebbinghaus an Austrian py 

• 

^ .t eliciting the same general 

Although both illustrations are second, remember), 

the first, an answer team might be chosen 

For either illustration '“™7nons.deted incorrect 

swers but could legitimately be CO 



82 Short-answer Items 


Sider'tw d-fficulty writing a completion item con 

s h/ V The first is to include only a 

occur '>'= ambiguity that may 

mg TorLo m f ““P’"*'™* The second is to consider offer 
mfo a multiplier" 

are all considered'™” “ common They 

vapors inertgases,;;;^,) b-o’og'aal 

• that the pressure of a gas multiplied by 

constant emperature weight) is equal to a 

resents a'SnablieffinIis““™"/“'' completion Hems rep 
and sconng difficulty that often arr ^ l«ni»tjng the ambiguity 
tiple choice Items are discusred m d.TiT P°™P’a''on items Mul 
m detail beginning on page 90 


true-false (YES-NO) FORMAT 


Some short answer it^m 

elude tb^ip™:;*:!;:^ 


was 


Examples 

SviedbVS"^^""u™ — 

• ™aa.°Mip^^y^ureld desert 
m Australia '’een found 


true FALSI 

true FALSI 
true FALSI 

true falsi 
true falsi 



True-False (Yes-No) Format 83 


Below are a list of plural ammal 

those that have been done properly and NO for those that 
have been done improperly 

oxes yes no 

YES NO 

yes no 
yes no 
yes no 


deer 

mouses 

bear 

monkeys 


Pros and Cons. One °lf nem^to write and can 

fact that they are perhaps the to write is due to 

be answered quickly by statement that is either accurate 

to their simphcity-just a to the fact that the 

or inaccurate Their easiness to answer « ou^^^ „ 

student "“d only read the s a teaching students 

But what about the read these items’ Another 

information that is of ambiguity that may be con 

difficulty IS the significant amount ot am 

tamed m these items 

. When a plane crashes ,L 

Canadian American border,^ TRUE FALSE 

survivors are buried 

How many of you read survivors are character 

that It contained th® VJ ^ correct answer is false Or, 
istically not buried Thus, m 

. Earlyinhiscare^r 'ViURogem^^^^^ ^^^SE 

•■I never met a ma 

Did Will Rogers make ^ statement been altered’ If Will 

of fact Free choice >«^j;,^Mse item the fact is g«en mjhe 
dent to recuH them inaccurate form and 

student in either ac 


tnswers No Yes "o Yes 

:alse Tree False False False 



M Short answer Items 



lhcmcasTcmcmof™^“'^Vs‘™'^^ to work well m 

betu^'" ""'"‘"'"’S the student ‘*'^'=''™“t>''on between 

Fin ?,‘^*'^{”ents of correct and men or discnminate 

ucaknes ' « •'■<= ™tter of ■"'=T'Fetat.on 

fiftv c!r **^'^~*^“**‘^ »ems When sIuh"^ Perhaps the biggest 

beVbvT”"^-"e"eht feeLTJe''H 
measure sshat°s!irj’™'' Th^pur^^^ 

The .arge't;: tj--re capable of, 'no^tToriulkv 'th ‘ " 

“ Ereat manv ”” tisuallv nnlv u„ s^e 

eecounl for a 00 ^" a mmd, then *'5' using 

penicularly for thlf tT'^’u ""mber of pomu on Eucssing can 
'"J' them „ c ^"Shter students sTn V '™=-folse test, 
" "> make educated guesses “^-= of olues 
7«bru„.„ca„b„ „ This sveakness can be 

‘"'he.p«,f,er^,hea,encc,i„„o, 

and not of knowledge 



True-False (Yes-No) Format 85 


compensated for, in part, by imposing a penalty for gussying 
wrong, for example, deducting a half or a quarter of a point for 
each incorrect answer 

Writing the Item. In wnting true-false items be careful 
neither to give too many clues nor to build 
™le is to avoid 'f-- "\arTm M, and^^^^^^^ 
yormayMImtoVehabn^oJu^^^^^^^^^ 

—a habit students may g ^„l,o touches his nose before 
rowmra7nch"of °h“ ^tche^r who turns his glove before throw 
mg a curve 

• The market value of gold always exceeds FALSE 

♦tip imrket value of Sliver 

• The market value of gold exceeds t e 

.^oty^rXtfpueofgoldexceeds 
the market value of silver 

, A onA sweemng while the second is 
The Hrst illustration is „h,ch it refers The third illus 

ambiguous-not telling 'h® ‘™= ^ ,^ree as a true statement be 
tration is the most ^ j second would not be recom 

cause of Its specificity T suitable false item then 

mended as either true or fal 
might be 

.Today, the market value of silver exceeds 

».™ 7 ? 

A helpful practice in half of them around to 

, rue , iLs and then afterward turn 

make false items This sy |^,ructure and also P™^ “ 

Hppree of uniformity of true and half are false (thus 

rest on which half of Zl^Zt"cZs 'Jo!d 

"not" does turn ® ‘““jjs either a clue or an ambiguity 

a false) but often also adu 



86 Short-ansuer Items 


■n\aciioicc 


specf'o one 'IZy, re 

strong guessing dS' 

mclude on/y rimgfe mnioTpomf/ ““1 “nambi^ously as possible 

relationship between two facts “T' 

for example ttship that the student must judge 

• Maid Manon Little John, and Brother 
■s a bad“Lm"b 

tionship betweerc'haracte™* rbook h 

Frwr Tuck with Brother Tuck It w^ replacement of 

• Marion, Little ;l nd P " “ ^ 

are all character, , 

bef ““aS^Tf 'tte'ob™® objective 

'he petal and the sepal of a o' ■'"t ""> dtstingu.sh 

fol o!u ”''°”a. and the h»e°r"h“' "’ ‘*'0 f°™er 

™ tag Item might be written °f ">« "alyx, the 

’ none'’r?i'al^‘‘ Parts of the 

true false 

classification foraut 

Examples examples below 

used as verbs and draw a 

;&ps 


S myself 
^ Were 


such .s'a 



Two-choice Classification Format 87 


• Use or an before each word 

oak 

ear 

uniform 

umbrella 


-hour 

.mountam 

.orange 

Jetter 


• Teddy hasn't got | ^ | 


money 


Pros and Cons T.o^ho.ce 

abo\e provide teachers “ “^^5 and Lo add a little varia 

fitting test Items to , than the trui^false format 

tion to a test These formats a ,han multiple choices 

and may require less ^ the task of classifica 

(meaning three or more choi j categories— the presence 

tion can normally be cast into ^ European country 

of a quality vemus Its absence in . ^ so 

versus a non European ^tegones your item becomes a 

on As you add <=''‘®*‘2““°tually a matching item However, the 

multiple choice Item and eventua y^^^ ^ ,^^t which 

contrast between that v^icn n 
does not calls for the tvv 

•^raXctr^each^ta^^Hatisasm 

a Atlanta , Madison 

b Birmingham > New York 

c Chicago Pittsburgh 

d Denver conveniently fits the task of a 

one v^rmbfe formats the f 

,e tto^lSLlV^-^Si. “ ^m a“d t^d^ 

ual knowledge is susceptible to ambiguity with 

problem IS often less aev 

uh presented maser 


ranifora, meunta.n .nd letter any check 


Answers 
Underline b d < 


f h all 



Short answer Items 


«ork!wdl ^ Remember that the twaehoice format 
£°rr ,0 ie ^ ‘he cate 

%saUe catTpot af'"' Z™™ other potentmlly con 

instances or "“fl olassified clear 

traled.nafieldsuchasmathemato''*”''^ 

X^a;;rm“reX:f " "here nght and 

betweenftt Smt ™ considered causes of the War 
slavery c 

states' nghts economics 

Petscna, conflicts tomT^Tmerfercr" 

sonal conflicts) whlie^oth^nernm' 1 ''“'' '"“"■”8 (eg - P=>- 

meant to suggest that the two^ihoice '"‘‘cism is not 

potentially confusing areas bm fha! 1=0 “sed m 

' ntt” 

Seh malrn" *'>oW be as'^ranT® “ 

It is^ofien"'' "te order andT “ s° that 

"tc ao .tern TC 


Answers 



Twochoice Classification Format 89 


tinguish ductless glands from Is this nonglands or duct g'ands or 
both’ This decision will form the basis for the choice of n 
exemplars The item might come out as follows 

• Circle those parts of the body that are ductless glands 
Pituitary thyroid kidney 

category, for instance duct ess g specific 

The alternative to the S™- ,, Jmy (depend 

category or can simply be glands can be contrasted 

mg on your objective) Thu ' 8 „,at is not 

with glands having due s or „g|ands) When the categones 

a ductless gland (i e , glan ■ „ ^^„ll also be broader and 

sa; ... — - 

s ^r 

Keats Whitman 

Frost Baudelaire 

^ ' Wordsworth 

, 1 , have been limited to poets The 
In the second example f" j,s„ngui5h between nineteenth cen 
smdent must therefore ody^tm^^^ ,e he or 

students to make (wh.cn 
objectives) 



Answers 
Pituitaiy thymus 
delaire 


thyroid 


adrenal 


cortex Keats Wordsworth Bau 



90 


Short-answer Items 


of ^ “-I stimulating means 

rLTonahlv? acquisition. It challenges teachers to be 

eSmuhrsI aoH (Particularly in choosing non- 

S thev ar?,„- r'"*"' =>ware of 

What they are trying to measure {i.e., the objective). 


multiple choice fommt 


tipie hhoice,™l°hoS b™ubia?? ®^°“•™™e^ format is the mul- 
lished than in teach^-built tots A 

oilers from three to five aliem,.’ ’^ multiple choice item typically 

met the rest are inco®"t choTce,T ^ 

Examples 

™™d'd 10 ntorry oho of 

"sr-ar“" 

O- himself married to 
him from marryine 
f himself married to 
him to have married 

"al fnction between i, and tL 



■'^"''""’merlcholrc. 


•sometifnw called 


‘J'stractors. 




Multiple Choice Format 91 




e None of the above was the reason for 

• "In a flash it came UP°" ^° „eai,h With the growth 
advancing s value, and the men who work 

of population, pnvilege In allowing one man to 

It must pay more for t P J ^en live, we 

own the land on andjro ^ 

have made them his subtle alchemy 

as material P™8‘'"” realize is extracting from the 

that m ways country the fruits of their weary 

masses m every ^ 

" f likely to have written these words is 

"’’^'’'"TohTlacobtstor 

j Wdham Jennings Bryan 
c ThorsteinVeblen 

^ Lincoln Steffens 
e Henry George 


c s e 






92 Short-ansuer Items 


]lem n , illustrative 

the t'e? ' ','r '”“'•'•>’■««) "as chosen by about 60% of 

"ntten S toUoI!s“l Biven sentence would be re- 

resulur? entSL (b Choice u would 

and e result Z Srammatically incorrect Choices d 

Choice b results in a wordv sem 'ha meaning of the origmal 
ate tone In order to «:»i ^ a ^ somewhat mappropn 

alteraatue phrase and times, once for each 

Obviously, such an activiiv them to find the best one 

facts, comprehension is more than mere retention of 

choice Items haxe the potenhann^^ ^^ll-constriicied multiple 
application tneasure comprehension and 

■s based partly on the simplicitv of tb degree of success 

■mplausibiluj of most of the c^oiclf ‘’““'J™ P^«'y ™ ‘he 

«ming of multiple choice question; P""’™'=‘'y c and d A short 
« that the correct utisiver can sortie^ 'a pchieaement tests 
=>:> prior famWedge or 'nSrncnl “ without 

00 many clues or too ma“y •hero ere 

‘ '",‘”'0 ‘he aptitude of the «ud “ because an 

chosen by only about a'uTrleToVrhl'’; ‘"“'-“vo item, was 

ZofZ'rr^^ *= ^"eat^iVL thr'’' heiiause 

t«p/e choir ^ Hoviever, a certain A unfamiliar per- 

‘omclimcs be m'im°"/a ™ 

test n.. V "'‘u*unzcd by a built ,n Guessing can 

"rong ansucm l"®*" ”"o-quailer nmlib"'^'’ scoring the 

S;f':bTb'f'i‘" ,eorc and 

cal usclVha„ “ '“"‘“'"doub ™dW responses’- 

difllcult iic^ uther forms of commer- 

responscop, mu’ r^f' because fn ^hey are, how 

P"ons. and (2) ‘ ey demand plausible 

’"--"-oprcfimmory 

'iiriculi, as 




the 



Multiple Choice Fonnat 93 


leslwg, analysts, and refinement in order to sharpen the 
between the correct answer and incorrect choices Both of these 
requtrements have tended to limit their use y teac ^ 
fortunately when teachers have used them the ttems o en have 
not undergone the kind of scrutiny and refinement required for 
maximum effectiveness 

Writini: the Item Perhaps the most crittcal par' 

Writing me item n selection of the response 

struction of a multiple choice Item I choices The 

alternatives— the correct answer an objective 

difference m difficulty betvveen wri g selection of mcor 

Item and writing a multiple c ‘ ^ plausible to someone 

rect choices These wrong from the 

who does not know the ans , y errors that students 

correct answer They should »P "nd^“ "edge or faulty com 
are hkely to make tf they ts a tall order 

prehension f ^^iHustrated m a subsequent chapter can 

Item analysts, as can also be facilitated by keep 

help considerably and the prMess 
mg tn mind the suggestions ^ 

^ L“d use them eTfbasts for wntmg the incorrect response 

Not only should sSd’also°h^^^^^^ Tagnose the 

from incorrect performanc • " ^ ^jents have acquired, Wrong 
kinds of incorrect no'”"" common errors best accomplish 

answers or dietrajMors that mp and 

these purposes, those of effective or working response 

Simply serve to reduce , 

choices provided For >4, ^nd is 

. The lowest comm ^ 4 

good a ^ h 5 

6 24 c 12 

c 12 d 2 

because all four choices represent 

The choices to the left are 4 tcvo of the fractions, and two 

choices on the right are 
IS c ) 



94 Short-ans^^er Items 


poor 


better 


(2) Construct incorrect choices that are, in fact, incorrect 
It IS equally undesirable for all or nearly all students to select a 
particular wrong answer as it is for no one to choose it If an incor 
reel choice is over chosen that means it is probably too close to 

eSr^rV' b ^ answer/ 

meomet chn T'f *e particular 

rfXwtTamUs' 

' IlTedtot"'”'' 

a an adventure story 
h a science fiction story 
c an historical novel 
d an autobiography 

sideredtote**”"** under the Sea is con 

0 an adventure story 

0 a tragedy 

c an historical novel 
d an autobiography 

be «n/d«ed"SmeX‘ modem ‘‘ 

example choice b is chanTedTo‘’Xed;'“'“‘= the ' better- 

length, com'pSu^an™™'''* comparable in 

answer ">'• Bcammatical form to {he correct 

know P“^«TofaS uem".*' “'’^‘"■efon of response 

plesT"”"'"™ 

Icng an?Sri'"«"y '•■ITcren^a^T'’'^ ”-e com 

"“oc • 'Vhenwesaythatae 

d'ction we mean that i '’°""csses appellate juris 

1 ™«Wajuiy 

cidcapp^^^'^ nttthority to review and de 
S::Stml'a'’"’"8.nal,nal 

“cclarelawsunconsiituiional 



Multiple Choice Format 95 


better • When we say that a court possesses appellate juris 
diction, we mean that it 
a must have a jury 

b can review the decisions of other courts 
c can conduct the original trial 
d can declare laws unconstitutional 

(4) Wnte the quest.ons and cho.ces m language that your 

ctiidents can understand 

nlv if that is your purpose Do not intro 
Construct a reading test only ges 220-23 and 248 

duce the vanous forms of bia intelligence, test wiseness, 

by writing items knowledge based 
reading comprehension, or knowieog 

of'hTfonowtng ts a statement of the Yerhes 

Dodson Law’ ,„„.si,ion of habit strength is a 

“ SrerieX'g function of delay of rein 

forcement mt^nuitv of noxious 

^ IXral-"anSr::.sit.onofh^^^^ 

IS U shap'd stimulus intensity and 

maximum mean physiologic response is posi 

live u"‘’ ‘‘"“'■^^tinction rate is more rapid fol 
" ^.wm/rauo -heduhng than following inter 

val scheduling vprkes 

, Which of the followtng is a statement of the 

Dodson Law’ slowly when rewards 

“ Learning o immediate 

a« delay'd 'U^t 3,ress is 

t Learning » K intense 

, S:■S:mm^Sra stimulus, the greater the 

bodily learning occurs more quickly 

d Pr'^re^iom tsrrds^have occurred regu 
^riy"rather than intermittently 


better 



96 Short-answer Items 


In both examples b is the correct answer, but in the first the exces- 
sive use of complex jargon clouds the purpose of the item. Good 
measurement is not pedantic. Its purpose is to measure objectives 
01 instruction rather than vocabulary. 

(5) State your items so that there can be only one interpreta- 
tion of their meaning. 

It IS important to be specific as the examples below illustrate.* 

poor . The shortest day of the year is in 

CL. March. 
h. June. 

c. September. 

d. December. 

better • The shortest day of the year in the Northern Hem- 
isphere is m 
CL. March. 
b. June. 


December. 

elearlyrequifes^dTs'lhTcoreSlre^ponsl'’^ second question quite 

•he use of such 

The above kind of ^ whenever possible, 

use often increases determiners. Their 

tjualifying an otherwise plausible ni, guessing by dis- 

ticular have learned that thines rar'l^' students in par- 

ne\cr and that many test build ^ ^ occur either "always" or 

lioidT 'he •rack I.^ let "urds to try 

a'oid choices that contain them! * us a clue to 

Bn°dte/- h"'’ ' upon 

b. u mo’’T'‘'“.' 

c. the city shares in city. 

^ created things. ^ natural beauty of all 

natural 



Multiple Choice Format 97 


better • The theme of Wordsworth’s poem, "Composed upon 

Westminster Bridge,” IS 

a a city is more beautiful m the morning than at 

b thf countryside is more beautiful than the 

c tfiTcity shares in the natural beauty of created 
things , . 

A the city IS not an attractive sight 

e the pwple of the city believe in its natural 
beauty 

r , » "c,!! '* and “never ' m a distractor often 

The presence of always, ^ . wrong, thereby reducing its 

serves to make the choice Xf-better" example, these 

value as a plausible ^ t ,|,e distractors and from c, 

words have been eliminated from both the 

the correct response withm 

A. ^vtra clues to the correct answer witnm 
(7) Do not provide extra ciues 

the Item statement itself 

"Gateway to the West is 

a Topeka 
b Kansas City 
c Chicago 
d St Louis 

, Des M°'"“ ^f„red to as the "Gateway to 
. The city most otten le. 
the West’ is 
a Topeka 
b Kansas City 
c Chicago 
d St Louis 

^ *^"^*des the clue— a Missouri city In the better 
The poor sample prow^^^^ , g, jo,,,,. .s asked about 
example, only the ^ single item (except 

(8) Do not test more 

as noted in poi for 

It .here are two maltes’an item confusing because 

multiple points m a sing p„,„, answer 

the student does not K 


poor 


better 



98 Short-answer Items 


poor • The pnncipal value of a daily program of exercises 
IS to 

a eat less 

b develop musculature 
c increase intelligence 
d keep fit 
e use up extra time 

etter • The principal value of a daily program of exercises 
is to 

a eat less 

b develop musculature 
c increase intelligence 
d help make fnends 


c use up extra time 

can'bc conLdered'the’^e!!*''^^. or <f keep fit 

roake It impossible to answers 

>ng one of the acceptabk choice? d^k '’y '■®P'®=- 

ceptable one, help make fnends ^ ^ ^ ckarly unac 

whTrr:^'p?op'“at rj;:? ■‘- 

The socI?H '' 

both o and ™?«‘of?h''Ltvf" •“ 

c The following is an ex^fe ^ 

develop musculature 
c keep fit 

^ ® anTt'hc^bL™''®' 

Si7aS “1!’" 

more thn ^ dearly indicated unamV, answers as 

the s?id\?tr b? sr;; 

CO) After the lest 

«nect choice Seln'^^e '>'= '°<=at'on of the 

-"‘l^mabasisasp^ssSk''' - ®) on ?: 



Multiple Choice Format 99 


It IS only natural for test takers to seek whatever dues a test might 
offer A response pattern is one such clue They might note that 
ha^not occurred m ten .terns and use that as a bas.s for ^es ng 
e on the eleventh item By being as random as possible 
the location or letter designation of the correct choice, you render 
this kind of pattern studying a superstitious beha 

To avoid It, read your test over 

This IS not an uncommon pi 

looking for instances of clues from o 

example 

• i 2 icn means 

a nothing 
b something 
c laughing 
d otherwise 

e then 

• Kiel! n'est parfait 

a all's well that ends well 
b nothing IS perfect 

c reading is fun 

d eve0thingislost 

e iceereamisgoo test he or 

When a student sees two f ’e 5 „,er to the other In these exam 

shecanuseonetofi^reOT^ ^,^ 1 . m each to get an 

pies, he or she could observe 

swers a and b respective y |,euid be short, unique from one 

(12) Response cl‘cmctwc^,jeally clued by the question 

another, and appellate courts illus 

The "poor" example on ® ^ ^t response by means of the con 

rentes a question cluing the coreret^ ^ 

nection between ‘he “^^cy by substituting decisions for 

“ample overcomes^ answer choices that overlap 

m thL degree of c°ctcc‘““ triangle contains 

’ n^rra^Sualsides 

b three eq'^lat'lca 

c “ of Mual sides and a 90“ angle 
f three equa?s.des and a 90” angle 


poor 



100 Short-ans\^er Items 


better • To be labeled an isosceles right triangle, a triangle 
must contam 
a a pair of equal sides 
b three equal sides 
c a 90® angle 


d a pair of equal sides and a 90® angle 
e three equal sides and a 90° angle 

eh™ni,i‘’r)! ^ =>^6 all correct This overlap is 

mill in 1 ^ the question or stem The use of the word 

he tudlm 1 a that enables 


'UTCIIING format 


a„TS;t‘r2:^^^ 1,“' ->PP'P -IP- 

this tjpe of test is used to detenninr, *1! *™® Pnmanly 

tmguish between similar idea« f student can dis 

«1I> considered toTethem^, 'ypt 

to construct ' >yP= of short answer item 


ExampU 


(1) Mas the fatherofour 
country 

m "“““"'’fo president 
(3) ''OS the first secretary 
tne treasury 

M) hilled acab.net membi 

in a duel 

“'''“ration 
‘^Qcpendcncc 
«) 'hatred .he Cons, ..u„, 
^omcniion 

’ o7h‘?s 

«> ,7fc"C'Iha,.„g„„,j, 

hfc to eitc his country 


Aaron Burr 
^ Nathan Hale 
c James Madison 
d Andrew Jackson 
! John Quincy Adams 
/ Alexander Hamilton 
I George Washington 
« James Monroe 
* John Adams 
J Thomas Jefferson 



Matching Format 


(1) The argument 

(2) The opposing argument 

(3) The resolving argument 


a synthesis 
h prethesis 
c thesis 
d antithesis 


Pros and Cons Matching .terns are tun to take because they 

arc hk™ru"Jcs and 

together They also embU tlte te 

a single item In “^‘'degrM of efficiency A third big 

“uV^mXng ^ 

^"'zthmr:-si;;Thavesho«« 

IS that ihey are diffcidt end /line ^ appropri 

writing one matching exer „„l,ng a half a dozen of an 

ately be called than items ) 

other type of together yet each must be dearly 

because all the pieces be pointed out that 

distinguishable from every eUcitmg all types of infer 

matching exercises canno particularly those involving 

lists of potentiaUy «"f7J3^„volvmg mdividual pieces of infor 

work well and other situation . n,a„;hmg raer 

mation, where they do dues withm the items themselves 

cises have a tendency to provid and every 

Like all puzzles, pieces remain and are therefore 

time a correct fit is match the stems and responses 

ea”er to fit Students “f “ ^ „p„on can be reused), thus 

they know go together (a of the remaining 

immediately reducing th responses has been reduced 

tasks because the number o correct response 

¥hen, too, stems the shortest stem) Effective 

(eg, the shortest r«po«^^ these problems 

writing of matcni b 

The first rule for writing a matching exer 
Writing the Item Th n common elements of a single 

CISC Ts1L^achitemm^‘f^l;^ of . flower) Because cluing is a 
category (e g a ^ 


(11 g (2) e I3I I W “ 


, , (6) e (Z) •> “1 * ° '' ° 



102 Short answer Items 



'“■■wnSfthifl5!^‘oreiraseh"® “f 

PO!s,bk (This mfacMsan^l ^ “void dmng where 

particularly those where responsi“^„r'"'’"® 

‘■Pns are Keep ,l,e respojes lhor, ^ ®“SS“ 

Jd notioverlapping, provide p/ans(W^ responses distinct 

mafc/i Muh any stem These extra responses that do 

m the left hVdcZ^„ ">= "S^t hautcL'L'tuh 

poor • (]j ubiquitous t, 

(2) enigmatic ? harmful 

(3) dcletenous * ^‘^uivalent to 

^4) *amaroountto ^ "'‘despread 

d purposeful 
® puzzling 



Matching Format 103 


better • (1) prosthetic 

(2) pathetic 

(3) prophetic 

(4) peripatetic 


a predictive 
h Itinerant 
c artificial 
d abundant 
e pitiable 


, , „ *1,. ..pm words bear little relation to one 

ano‘ther.’’heLe!\t.le or no “nai 

and does not relate to .),e words all are potentially 

adjective In the "better ^ay affect their dis 

confusable (at least ^ associations tend to become 

cnminability in ' madvertantly provided and the 

blurred in memory) No c ^ , c .. another look alike" word 

distractor is a synonym for p 

to those provided imperative that you mctudc all 

In fairness to the stu matchmg exercise Matching 

the necessary fil.r <i2e and complexity can contain an 

exercises, by virtue ot me . , must deal and can be used 

entire situation with t „„ and analyzing Consider the 

to measure sequencing, c 
following example ' 

. below, carefully paying attention to 

• Read the statemen . j|,uu next to each statement 
their relation ‘ ,ed 

mark 0,6 c contains the central idea around 

o It the *‘=‘f7"4',a,ements can be grouped 

t H^'Se Statement contains a mam supporting idea <,1 
the central idea ^^^ains an illustrative fact 0- 

c It the luted to a mam supporting idt* 

^ 'statement contains an idea or ideas uh, crve 

irrelevant 

Test A Guide for Teachers t 

.Taken from Ail nghts reserved. 

© 1959 by Educational 
permission) 


MIC IZle W 
To both exercises I'J 



IM Short-answer Items 


(1) The Roman roads connected all parts of the Empire with 
Rome 

(2) The Roman roads were so well bmlt that some of them 
remam today 

^ gTOatest achievements of the Romans was 

f41 w n? f ^ durable system of roads 

coachK used horse drawn 

fe-fr^ATerdtardT; 

roadr""''^^^ Italy some of the roads are onginal Roman 


When matching^eiTO*f^ on ^ Particular stem 

satisfy this requiremeSTh^n t, *“ 

MU. sy slemcc response pl.»eros“no not con 

scramble the order from one matrU °'™ ability to 

™“te systematic than you mav^h!nV "if another You are 

cts (as suggested earlier) or numb random num 

“ Finall 

c«mn!'™^'""° "anrsmraTt"r'°G'“ "" “ ««cstion 

^uT T P"8= “» "las mmy ,h,L b“''f Washington in the 
uun.iy but the only Perso" "d®m’’“'^“ •*'= father of our 
rSnons “'■®' Washington) Krontb father of our coun 

longer an7''°"^ on the^ngh, w7h Z *''= '‘=f‘ and the 

Gcne^r?'P'"'l"'’f"'=«''° ■" *'= 

cult or to elicu''a"' *° "lake an^^’° “‘"‘"I’' 

'"ustrauonf ' " "" Cassificrtirn aT m 

'“=> in the preceding 


Ans-Afcrj 


'•'‘ICISlelsie 



Short answer Formats 105 


CHOOSING AMONG SHORT-ANSWER 


The SIX types of short answer items are summanzed on a compma 
tnc basis in Figure 4 1 While each has certain unique features that 
makr. u Ll for specific testing needs there are occasions when 
To types can be used interchangeably In deciding which type of 
test to’^use consider item writing difficulty 

and specific measurement Multiple choice 

used extensively iStS to difficulty in prepara 

selves better to item analysis an en 


complex processes) clearly can be refined and i 

selves better to item analysis and hence 

proved unstructured and completion 

For the “"<=-""’‘=^“"0 class te . 

Item types might be ^ wuh a 

straction and because scoring niy elemen 

small group These item typ j-ygioped the test taking sophis 

tary school children who ^ ^^,5 por older students these 

tication of their older co IP ^ responses 

free choice item types ^ . altogether the arguments that 

(Some teachers may Pr®^“ boice items and thus use more struc 
can ensue from scoring free chom^^ ^ 

tured formats with any ag under a wide range of circum 

True-false items can be u^ ^boice or free choice 

stances as a ^“bstitute for e th^^ be very difficrrlt to write 
Items although good fake ^^^aeptibdity of trui^false items to 
This difficulty plus the g generally useful short 

guessing, makes them one 

answer types consuming to score particularly on a 

Matching a‘e best used on an occasional b^sis for 

large scale basis They are to ,.,uat.ons however 

variation or a rhang=/bust format to measure recognition of 

kno'vSe or”omprehensmn ^^^.ure the stu 

Othir twochoice detn ,b, same conmxt 


knowledge or comn"-- most reaony _ 

Other twocho.ce Item ,i., ,ame context 

dent's ability to classdy a j ,hemselves quite ^ 

as the free choice types ” m learning classification tasks 
sheets as a way “*fprocess) These points are summarized 
- f> mstructio”*** t' 


sneeis as « 

(within the instruction 

Figure 4 1 



106 Short ansvfer It«ms 


Flgute 41 An Overview 


of Types of Short-answer Items 


Typ« Format SampW Item 

Unstnictured Free What form of economic 
Choice system Is most often 

instituted In African end 
Aslan countries follow 
Ing Independence? 

ComptaUon Free The form of economic 

IFIIHfll Choice system most often 

Instituted in African and 
Aslan countries follow 
Ing Independence Is 


Difficulty Dtffieulty 

tn Writing In Scoring Measure ot 


Recommende 

Use 


True-false Fixed The form of economic 

(Yea-No) Choice system most often 

Instituted in African and 
Aslan countries follow 
ing independence is s> 
clailsm TRUE FALSE 


W*er Fixed Circle those African end 

Tm^Cholce Choice Aslan countries that 

have Introduced social 
Ism upon achieving 
independence INDIA 
GHANA ZAIRE 
CHINA SOMALILAND 

mvA 

Fixed Upon schlevlng Inde- 
Choice pcndcnce the majority 
ol Aslan end Airicsn 
countries turned eco- 
nomically to 

(A) capitalism 

(B) lalsserfaire 

(C) socialism 

fO) mercantilism 

'^•‘.riling Watch the countries 

Choice to the economic systems 
ft) Cspliallsm 
121 Communism 
13) Socialism 
(() Isolationism 
a South Africa 
f> SellanVa 
c Ghana 

— d Madagascar 


Recognition Multi-group/ 
of knowledge repeated 
testing 


Eesyfor Classification Onetime/ 
small of facts one-class 

groups but testing 


Recognition of Multi group/ 
knowledge or repeated 
comprehen testing 
Sion (or occa 
slonally of 
higher levelsl 


Most Easy for Recognition of Change of 

diffiouU small knowledge or pace 

ft) groups but comprehen- 

more alon 

difficult 
for larger 
ones 



107 


Additional Information Sources 

Cjcrbench, J R Specimen objective test items NY Longmans Green 
& Co . 1956 

Gerbench, J R , Green, H A & Jorgensen A N Measurement and 
exaluatton ni the jnodern school NY David McKay Co 1962 

Green, J A Teacher made tests NY Harper & Row, 1963 

Scheier, I H ^Vhat « an objectne tesO Psychological Reports 1958 
4, 147-S7 

Schoer, L A Test construction A programmed guide Boston Allyn i 
Bacon, 1970 

Wesman, A C Writing the test Hem In R L Thorndike (Ed.) Edit 
cational measurement, 2nd ed Wash, DC American Council on 
Education, Chap 4 

Wood, D A Test canstruetton Development and interpretation of 
achievement tests Columbus Ohio Charles E Merrill, 1960 



108 Short answer Items 


SelHest of Proficiency 

(1) a In a free choice (otmat, students are asked to _ 

(state, Went/fy, evaluate) the correct response 
P In a fixed choice format, students are asked to . 

(slate, /dentify, evaluatei the correct response 
(a) Match each item on the tell with its type ol item 'tom list on the 
the right 

a The capital of Maine ts 

1 Bangor / other Iwo-choice 

2 Portland w unstructured 

3 Augusta h/ multiple choice 

4 Bath IV completion 

fa The capital of Maine is . 

c What la the capital of Marne 

(3) Which one of the following Is not a recommendation to follow In 
writing multiple choice Items? 
e Vary the location of the correct choice 
fa Make sure that incorrect choices are implausible 
c Avoid letting the item clue the correct choice 
d Make correct and Incorrect choices about the same length 
e Make sure that incorrect choices are completely wrong 

(4) Indicate whether each of the following statements is true or false 
a Completion items should have a single correct answer, preferably 
a word or short phrase 

b Answers to earlier items should help clue students to answers 
for succeeding items 

c In writing true-false items, one useful rule is to include absolute 
terms like always and never 

(5) Given the objective Name the first three pres/denls of Ihe United 
Slates In the proper sequence write an unstructured, a completion, 
a true-false an other two^iholce, a multiple choice and a matching 
Item to measure this obfective 

(6) Given the objective Add two f-diglt numbers, write one of each 
typo o1 short answer item to measure it 

(7| vmich one o( Iho type, ol Items lisiep below Is the easiest to score? 
a eomplelioh „ 

mate inp g other two-choice 

c truo-false 



Self test of Proficiency 109 


(8) Which one of the types of items listed below »s most frequently used 
for repeated testings with large groups? 

a. mulfjpl© choice 

b. other two-choice 

c. completion 

d. matching 

e. unstructured 



chapter five/Essay-type 
Items to Measure 
Thinking Processes 


OBJtCTlMlS \ Descnbe the meaning oJ four thinking processes. 

namely application, analysis, synthesis, and 
e\aluation 

2 Identify the component parts of essay type items 
written to measure each of the four thinking 
processes 

3 Construct essay type items to measure the slu 
dent’s ability to use each of the four thinking 
processes individually and in combination 

4 Name and describe the criteria and procedures 
for reliably scoring an essay item response 



THE USE OF ESSAY TYPE ITEMS 


In the preceding chapter rve focused on the use of short answer 
Items to measure knowledge acquisition and f 

e\er we are often as concerned with the ability of students to 
uLu, and use what they know as we am then simp y 
knowing It In these ms.nnces type^^ X “ po„se ca^ 
greater latitude m ‘he fo™ ‘h P 

be restricted to a w ord or phrase ^ own responses 

null theopporumity ,ests^enable them to demon 

u ithui relatively broad limits y analyze to synthe 

strate their ability to “PP'^ o® ,he light of their know! 

Size, and to e\aluate new in ^ purpose is to have 

edge Avoid using ''™Vttey Lve acquired certain knowl 

the students demonstrate that y 

edge ,s characteristically a more 

The process of sconng the Hem one reason 

difficult one than the P^f” u usually be many times longer 

i;:-ritrirx»f-^^^ - - " 

devoted to sconng of essay ‘tem® 


- ^^^^^Vn^rhaoter we will assume that objec 

Again as m the preceding P ^.Q^tent outline developed 
tives have already been nd construct and subsequently 

and that you are ready f ^ subdivided for presentation 
Essay items » /i\ annlication (2) analy 


ana tnaiyou rtiw ^ .,^mc will be suDaivjuc« av.. r*—- 

score essay tests Essay '‘=®/ '"j3sure (1) application (2) analy 
purposes only “luation (Hole .hat the types of 

SIS (3) synthesis four purposes are similar the sub 

d'wLT “ p?rmanly %7;®;Tdrs1o'’rXr™c.mg essay items 
We will n°"' different purposes listed above 
>>f» used for tne 


that can be used for 


ITEMS TO MEASURE APPLICATION 


ihe use of knowledge m the solution of prob 
ipphcation refers to *=^ 5,) application involves 

EP As set forth by Bloom t 


non 

„ .-s set forth by oarticular and concrete situations 

me use of abs.ravt.o-” pa^ „,es of pro 

The abstractions may be 1. abstractions may also be 

" or generabrnd J*;^,e„r.es mus, be remem 

technical pnomp’o ^ 

bered and apphed IF 


111 



Essay type Items 


Examples^ 

• You are in charge of planning meals and ordering food at 
a small summer camp There are 100 campers— boys aged 
ie™ You must be con 

beX e " r* value since these will 

five da vs ft 'Yrite out menus for 

fvhvvoX H “-J ^«PP«s and explain 

you made the choices you did 

TfiTe andX ‘’T.f '“/"‘P*’ "verfiows at the 

hot water E^xX/“! °'X ^ 

in the productio " overflow of hot water aids 

geyserrused n ,h P""“P'<= °f 

Ss mTnt or mo '’XT'™ “"Sy (that I, how it 
described in class) ° ^"^vgy producing techniques 

perature m your bfui an?tim°T^'^'^F’° 

the degree markings hat h metal that contains 

about makmg a sefo^ 1°*^ 

read the tempe^aXelsm^or?;’'”’®' Y™ 

more than one side of a naL t *^™°'Pater’ (Do not use 
■ue ot a page to wnte your answer ) 

^Vriting the Item A 

canon must require thatThe studem"* *“ “aaaure apph 

acquired (probably m school) tl d “*= '‘"“wledge that has been 
concrete situation Thus ih c ^“'nbe a way of dealing with 

aPPhca.,onis,hat,/,e:,em m„:,®X™!= "" nieasuremLt of 

ha can somehow be incbd^' X f sitncon^ne 

mca,nr‘‘ •'> 'v'-'vh they «i^ ^'“dents being 

same Iiei"'"*i.°^ applicaiion is that tl ^ second rule in the 

“ or choice trL* ’•cqnirc that 

choice shn 'u i "’sk The thir/ T silualion in order 
"°norpretX?f'‘P'v problem itself ,h°t „ ‘''ansmitted 

"Xl;s r 

«“:a£r4''3i.™Trr"'“ « 

ship bclwecn the , “ '“P™ 'nil occur mcr obvious 



Items to Measure Application 


teacher can increase the likelihood that application will occur by 

increasing the salience of the relationship between the knowledge 

and the task For example, the .tern might be T^o 

tion, apply the knowledge you hate ^.ned in the unit on interest 

rates Such a direct instruction establishes the 

tion of school knowledge as a entenon f f 

and IS likely to increase the degree of ^PP" tirfolu 

the relationship within the prob 

tion can also be increased m mo words In the 

lem statement ■«'='f^f‘^^J’f^y..'"mnt.onal value" may provide 

first example on page 1 12 the t basis for the preced 

a Strong cue to the ^ f n bfprlpteir by di 

ing weeks of instruction Th , PPj^ explicitly making 

rectly instructing the stude ^„,„g ,hem toward it by the in 

it a performance criterion), or by cm g 

elusion of terms dealt with m e (bought of as being entirely 
Essay questions detail provided, the 

without structure The gr you intend it to In the 

more likely the ‘f d Si must be sufficiently present m 

application item, statement of the situation and the state 

the two major parts— the 5“ ^ all students to work 

ment of the context The less the detail, the 

within a common, fetation simply to understand what is 

greater IS the required ^ „„ge of responses 

required, a situation 

and makes scoring difficult (2) problem there 

In addition to the (1) gjure application called (3) re 

IS also a part of f rremSents an attempt to structure re 

sponse instructions, ^ instructions include 

snonses to some degree R P j, specific points to be cov 

arrminimum or n-axmimn length ^ solution in 

ered or performances ^,ng ,he number of suggested solu 

aSlnion m descnb.ng d f„n evaluating performance 

tions required, and c . neatness, and spelling 

:n.in®g skills, ““rsdiiation and compose the item 
create the approp™ 



114 Essaj type Items 


ou are interested m a summer job and have learned of one 
moner f T Write a letter in 

an^whv ™ “”P describing yourself 

o? hi dSr IT ™ basis 

Its neatnel J" '"<=" ™‘ten your letter is, on 

neatness, form, and spelling and how convincing you 

gray area betwli muhTnlT h^* application is in a 

choose between SZ, ink ^o 

sider the basic "re,uZSsZyrrob™ctr'‘ 


items to measure analysis 


According to Bloom (19S6), 

=‘"t“'nt P="-tf f 

cf die way they are organized I m ‘'"“•“"'•’.ps of the parts and 
"iqees and devices used to c^nverHf “ ‘‘‘"''dcd P' ‘he tech 

"“Sion of a communication (p h 4 “n «> establish the 

''r'» 


Examples 

Galileo was »ntcrest*.ri 

•ion VeurZ ' ■""="'0'^ of^me ol °f -"oo* 

balls down studied thp T^bich accelera 

nnd used ihe inclined 

—Jlhel'S:;; 'boY -rnlZaZZr 



Items to Measure Analysis 


• Identify four reasons why Hamlet did not kill King Claudius 
until the end of the play despite his commitment to do so at 
the beginning Describe how you determined what these 

reasons were . . , 

• You have just seen a movie about the United Nations What 
are some of the reasons that the UN was founded in the 
first place’ Can you think of any other organizatmn m your 
communtty that attempts to contribute to the betterment 
of humamtv’ What are some of its activities 

. YouTave fust heard a stoty about a girl who was severely 

Describe one such time 

Writingtheltem Simdar to apphca.ion hem^ 
typically include a do not contain problem 

Unhke application the student presumably is 

parts The situation is one wltn relationships, or organize 

familiar and that contains e items often ask the 

tional princtples which can rOTtrast, as seen in the example 

student to make comparisons and 

A c Ned and B F Skinner might teach 

. Consider the ways A S^Ned^^^ ,^,hn ues be 

a child to swim In eame’ How can you tell 

dtfierent’ In wbat ways the same^^^^^^^ 

As in the case of apphcat.on items t response 

reStTf^^^h-rsCra^^^^^ 

tional principles However, tte mem ac cated for by un 

basically short answer pel utilize the vehicle o ju^h 

structured or tree cho'ce ‘terns To u 

.tern. It IS useful to ask her invitaUon to expand upon 

the analysis was ^ccomp is ^ ^ jo on'' Smce the dis 

an identification IS to ask the on Smw J 

instruction may also m chapter - "tacle^^pr^ 

Unction between essay it ^ ,s no reason Y 

manly for instructional purposes 



116 Essay Jypc Hems 


objectives and hence multiple thought processes cannot be mens 
° "■‘= 
strucnaUo‘’nrrdnr“'" 

studdnts^m'iMrf conjunction svith 

experience Tht? 'iK.it *‘”ES conmned in and provoked b> the 
parts of an csDcnenr* ^ ° nnaljoe or difTcrcntnte the component 

(1961) an imp:rnr„r““^‘‘‘"®r’'’ !>"■> Schroder 

and solutions and hence is umthvnf”^ Producing unique thoughts 
While Bloom rmcli measurement 

short answer Items for m n^c of multiple choice 

recommended hero /or ='PP™''ch is 

It IS estremely difficult to wrim"^l'° "c?* '*'9"' “"P 
choice analysis items since to do^hifo^ ° d'slnctors for multiple 
ation thoroughly and know m-.« analyze a situ 

faulty thinking Then item leshn^ pilfalk and cul dc sacs of 
choices of right and svrong “ns"erT''An °p'’' 

sents considerable imprfclicaliiv r^ “I"'* Preparation repre 

teacher who evdl find the usror m classroom 

“P't^'ect despite the difficultv of seo°^ '‘CBs in this instance an 
o l^c ittst chapter the multinle oho l*icm As was mentioned 
“St and IS cfcerorm!lL"J,'^ raw maicrnl for 
ployed in published tests I '='’“™“'-'="cally most elTectieely cm 

WEMS TO MEASURE SYNTHESIS 

■"describing synthesis Bloom writes 

whSm” "^'S » me^cllrg’^ ^0 

,p - theoretiea, anu 

However for tho 

cs to collect plausible 



Items to Measure Synthesis 


Bloom (1956) subdivides synthesis into production of “ “ 
unique communication (like a sto^), b a P'“ 
set of operations (like a machine), and c a set of abstract relations 

(like a theory) 

^"'“".’Idd a second verse of four lines to the verse written below 

Men cannot swim 
As fishes do 
They only slave 

A hard way through 

. zr "crri 

presidency of ""“tr posmon Give the reasons be 
president leaves his or her p current con 

hind your cho‘« ,,ee.presidential succession ) 

stitutional m charge of raising money to 

ninnm (1956) warns us against limiting 
Writing the Item .,ems by making instructions 

tbe creativity required m synt 
and situations too detai 

„ me effort is m be raf J ^ m determine his ov™ 

considerable freedom J materials or other elements 

purposes freedom freedom to determine the spe 

that go into the fln P should meet (P 173 ) 

cfications which measuring synthesis 

■“ “Xem to'brs^lvS The d»e 
SrSUSionZ-rthrs^^^^^^^ -ouims that the 

?hl fnt^SSrnol^ve"^^^^ 

a mechanical lung' 



118 Essay type Items 



j © Punch 

Student go beyond h i. 

a ■“ '>'■= ^ 

mat.cn L"'"/ P™‘‘"«>ons based on Productions (These net 

student form"'!' society bm '“"’“td ‘nfo’ 

«say,tem ,T^V"/ "t™ ) Thuf f°f ‘h 

rmL of^i, , measure svnth statement in a 

oU^eujT‘“"' rV''™''' *’= of <h 

‘•esignmg a nove°"m''^""r® “ '■^‘■ve piece'^'to'm^ 
are among the proWe“ «l''‘Pment, pronosm 

Muremer '““tments of svmbe ® Procedur 

problem or a clos'e '‘''°uld have ’^“^1 f° 

J^P«:iric solution thr *^^?*^* "or had direct 

he synthesized mus h^i'r™?"™ (althrueh '>■' 


"’“■hecompSr;^^;"®;-' ■hmlcmg^Se’ ®'"" 

'ty °r,g,„„, I s,“dP™'=ms to measurt 
rodents frame ofreferen 



Items to Measure E\aJuatlon 119 


Bloom (1956) is also concerned about response instractions 
m essay items for measuring synthesis With regard to the var 
able of time he writes the following 
Many synthesis tasks require 

the product is likely to emerge y exploring differ 

siderable time familiarizing ^ materials and 

ent approaches interpreting 
trying out various schemes or organization (p 

It may tvell be that the procedures a te-her u^es_for measur 

mg synthesis should „„ide more of the conditions 

home exam for raample ^ j„„onal classroom test Per 

described above than would a mputs from other 

haps students should be “ sXtions In some eases 

sources that can be material that can contribute 

the teacher might provide resource m 

to the synthesis task . thought that synthesis 

It IS important to ' . " favorable to creative tvorh 

should be measured “"ff,"f„„der conditions that are perhaps 

and that testing IS usually d instructions should be 

antithetical to creative work P „onai test and put the 
designed to break the mo d of tne^ ^, 5 ^ 5 , have provided 

Student more at , e WaUaah ,he 

clear empirical evidence . Untimed tests mstruc 

measurXnt of creativity (J "", „„,elty or originality is 
nons reinforcing the idea that _,,ght or wrong 
to be strived for a statement tW .,^^3 ,.„ed a, 

answers are appropriate a 
measuring synthesis 




ri956) writes 

Of evaluation Bloom judgments about the value 

mg the extent o ^ 

cal or satisfying j.-tmEUishes between (1) evaluations 

and the absence 



Essay type Items 


ecLcml" f ''“^ards such as efficiency, 

y, utility of means for ends, and standards of excellence 

’ Wo'rThr^“f ^^'“=‘'■"8 P“em given 

poe^sJmctum m'T ‘he 

symbolism Vmt * o^'g^^ization, form, meaning, and 
cSla ffiolvh >1?’"”“'’^“ ™Iuation should be made 

defended i'Se uZ °-- 

time for revisions and j carefully Save 

appears in vour exam ^ ^ceading so that the essay as it 
intention It is sugg"IS ,h”,’’°°'''''‘ ■'cprcsents your best 
planning, eighty to wntino * minutes to 

essay P!ease try ,o ivnle lefibh'} ‘''=''‘*"''8 

not be counted against you/ ^ (nhhough handwriting will 

S:==K=r 

at the lisi ' '7" ‘■''“in 
'Vhan hisVuSn” hreath, 

fa„h ,s kneel, „g'b7hrb"?T?'“^ '■« 

And innocence is ” d of death 

Naw,fthouivouIdslXn‘’a'lrh'’'” 

fmm death to lUe thou m,XstT h™ “vcr, 

. , "'Sdlat him yet recover" 


, , iccover* 

- he., 

ment dlii.i ^ '““nccssfuHy noim ° ^mih dis 

Vcu"; senem, “ 


™nt tllustrates a geneS”^ P°»'i"E out 
Your essay should be ,b ^“^ngUi of the L 

Pd8«.n.ongha„it.'" ’‘’™‘ «ords m'fenrth (T- 


othl O I9S9 4 A'’' 'f' CterS ?" "563-1631) 

' ‘ oe necessai 



Items to Measure Evaluation 121 


• Above IS a diagram of an electrical circuit in the starting 
mechanism of a machine [In the actual Problem a diagram 
IS supplied 1 Do you think the circuit is sufficient to start 
the mLhme’ [The specifications of the machine are also 
provTded Write a short report stating your evaluation of 
dtreapac ty of the given eiieuit to start the given machine 
Be as So as possible m your evaluation and provide as 
Be as specinc as y J evaluation as possible 

much evidence or support for your evaiua 

Keep your report to within two pages 

Writing the Item. ( 2 )™LpoMe instructions Re 

that which IS to be information about the cniena 

sponse instructions also >“0 m® 

that are to be used in the ova ua ' to evaluate As illus 

The student must be g‘„ niay range from a poem 

trated in the examples ‘*’^•1 circuit Anything that is sub 
to an organization to an submitted to evaluation Thus 

ject to rational o»n™^“;;“"ly across subject matter areas 
evaluation items can rang ,hem a criterion or cri 

Students must also bojo avaU y. based Will the some 
term on which their ^ping the peace’ Does it con 

thing" work’ is It simcture’ Have its effects been 

form to the acceptable m „erion questions that can be 

positive’ These are ' “1, and more detailed criteria must be 

Lked Beyond this, addit on^ Students may be expected o 

supplied by the students them them m 

the student has learne evaluation items as , 

view are pernap^^^^l_^^jj„j^ents aM j,^unguishes be 

?r“m amending anb supporUnS^ .beir thinking relative 

dents demonstrate 

to the teachers OBJ 



122 Essay type Hems 


"T cvalualion proudcs 

evaluatme It '"g lo be etaluated a general cntcrion for 

criteria as wein '’n occasionallj more specific 

admonition to prov"lXall' unnrt 7'™"''™' 7“'’ “ 
tion or judgment ™ support for one s ctaluatltc posi 

answer "’<= short 

produce responses ttama cssa> Items 

cult to score (Short answer 'll "“"5™'’“"’’’' 
directly comparable responses course produce more 

choice Items ) In order In ft. J“'^"™larly in the case of fixed 
essays foot equivalent essays biiin ''hchhood of comparable 
Ipes of judgments or evaluations 7' compared) the 

the Item It IS hoped that S r, set forth in 

for some students when asked '’'"'"a'c the tendency 

example to write I |,ked this nn ^'’^“a'c the poem in the first 

h^^'Jf'hensfon meJ^tteundllym^^ ™ss" of lcd“e'and 

™niierofc“si;™“* •°op-,rm‘^=!;::rLt' 


--MS XOMH.SIJKB COMBI.™ 

A* 


SS- ^ f -tt^ent to 

“vaCe^r 'orh^™™^ ”p™S TSor'f ’f*' 

problem in ^o\uUoxi {eg cDnsrnT'l^^ 

according to piv ”^®"®Sement and then ^ solution to a 

student fo havr Another comm^'r'^ solut.on 



Items to Measure Combinations of Processes 


at the poster at the front of 
could deal with any one of a number of 

ole a fund dnve, a clean up campaign a political campaign 
iihohsmt" iescnbe what you think the art.t was ^ 
mg to say and " m^she made use oj^^ 

“ °ces:tul' wouW ‘you?ay the artist - m getting point 

your mother or ^"“t ^spei^s "fthe /ob For exam 
tion, discuss all he „ a su,t’ Does he or she 

pie. does he or she j'ob require physical 

work indoors or outdoor details’ 

strength’ Does it requi Then evaluate 

Does It offer 0PP°‘^“‘ vou have given in terms of how 
each part of the j, your idea of the perfect 

rob^you Send youriudg^^^^^^ ,Hen 

po°se1f’’so “describe the better way 

1 . cvnthesis, and evaluation can all 

Writing the Item le^Tart Hem as illustrated by the 

be combined in a ’>"8'= Tots an object an organjzation an 
first example Giving „„a/yee its parts or workings is the 

occurrence and asking them or workings is the ^P ^ 

first step Evaluating '*'“/ „ „ through synthesis is the thir 

redesigning or improving P combination approach is that 

^.^The maior shortcoming “ ‘ ,s a failure in analysis 

“ iFp responsibility P. „tion item has the advantage of 
(aTLst imphcitly) evaluation and synthesis m order 

“he‘to‘taHh'nkmgP^“; performances of 

'“°Tt w^*d be (1) those that make no mention 

3 yp,hes.s and evaluauo 



124 Essay type Items 


'‘® ™pl'Cit, (2) those 

OMtee '■ " ““b, notion items), and 

of andysrandVvTo?'^ ■'■‘'strates the combination 

to anSi a ” P™‘°‘yP= .s for the students 

uate Its componentsTaZt’^^Tert"™ ‘=''“' 

themselves must determm/. tii ^ cntena Often the students 
various aspects of a mrt i ^ m this illustration the 

job ideals ^ particular job in terms of the students' own 

tion— depending on hmv crcaUve thT*'’?'' P°S5ibIy apphca 
tion The value of this nartienla s°'“tions are) with etalua 
students with the oVrtunUv " P'-°''“>‘ts 

creativity to be functional h^. ““'"“y '=“1 mnuences that 

evaluation HoweverTmtX requirement for self 

to generate their own criteria for i ^ students if they are 

provided ) Also the teacher mus, f'”' evaluation are 

students generate their own evaLaf ^“ct that when 

sponses will not be comparabk 1 f '•’em essay re 

stances the teacher m^be nre ” r d™' I" auch in 

51°““ ‘hustrated by the tesMnZ l '^ Primarily to the 

than to the content alone ^ ehall see below) rather 


^'TBRUroKscoK^OESSAV ITEMS 


Content 

Criterion 


SliHf *'» "■ 

Although the reason f 

petblVamo'““7 ™h‘ 

■■•">•■■" sx £™i 



Criteria for Scoring Essay Items 125 


the student provides leads you to conclude that he or she is kn 1 
edgeable m the area of the es^y Since know ledge is Ae jder 
pinning for thinking and t ' , 0 ^ In elsenL 

will be needed as a P’'"=’“;f,hrp',^'qmsite knowledge that the 
the content criterion reflects the p Q 
student has acquired in a particular area 


. , clfillc are important components 

Organization and other wri g academic per 

of essay “"I criteria for essay response scor 

formance as well If more formally recognized 

ing their importance is li^eiy 
by students 



A 


compos" 


My Trip to the Zoo 


wll cost you 3(X 


Organiz. 

Criteria 


126 Essay type Items 


Process Criteria 


look for as evidence of organization’ Inttially 
recom^r^^ '! P^Wem has been set up or introduced When 
recommendations are made they should be accompanied by sup 

•>'= ^de*^ to tell Lich 
Sntrco lub b and which supporting state 

tion nfar. The traditional organiza 

arclu“or“’'‘”““'’'-^'’“"^ an introduction a boly and 

loud^'forsShTZL" »”nking out 

should be encouraged to nrl™''* *° ’'“’"'’'‘"S Students 

fore writing A straiem^ r^*^^ outlines for their responses be 
thoughts reflects a Wi^l ^ organization for communicating 
and aeciuents Ih,. r .hrlief' ^ “f progression! 
looked for and evaluated bv the ! “ “c organization should be 
'■on spelhngandgmmle’lL In add. 

m the communication process Resn° evaluated since they are aids 
students that these vanous asoeSs o"f'® 'nstructions should alert 
evaluated aspects of essay organization will be 

non an!l^!'s°Jy’’“,^“ "naasure applica 

t-nofthem themisTimportamc!!, " f'"' 

Each of these protisses rS! ff "'-ro carried out 

m! ‘’'"“"a of "a ■raplemeni!/°^‘r°"j°‘' ’’^'■^'nmendatwn 
* ™PP0rtmg that saluhon or r ™sons for jusufy 

Seq^lhr" 'J.eTd~;Th" A 

‘"“"'t "tc five steps shown 

■ "cfinc the problem 

«)We!ra!.emr™“'“t.o„r 

■ng on the nature of the task 



Criteria for Scoring Essay Items 


to be done In general, the first three steps should always be pres 
ent and tt the first «ep has clearly been provided in the it™ 

Itself, the second and third Ihird ™lp Tmarb"^: that 

ance requirement Within the secon 

only a single solution need f "^iX^proc^ (The 
defense or support would mininfum number of solu 

teacher w ould hat e communicate this number to 

tions ncccssarj and maj or may n 

the students ) cnlution and reasons to support 

The c\aIuation of » Problem reasonableness in 

It can be made on the l>asis j^teness and consistency m 

terms of external .^3) ongmahiy or creativity Accu 

terms of internal criteria extent to which the proposed 

racy or reasonableness re j ,5 to >ield a satisfactorj 

solifon IS judged to be ''f Have the co^ect 

outcome Will the P'-°P“*=‘‘, Lda Is the synthesised product 
analytical dimensions been M ^^n-ectly evaluated’ Since 

appropriate’ Has O'® ely correct answers to essay 

there kre mrely end obj«. T ^ „„ ,h, 

Items, the decision of solution 

ment of the scorer .„™ested criteria only procedural rec 

x=siS==r.%"— 

applied to the reason^ 

"’“Z'oue^uon of "'"P'trAe suWO'ting material is appro 
primarily to ^‘’;,trposed 

priate for and ms tn as high scores as com 

It suffices in^d^ responses vvd! have a preconceived 

;ffi.eres"obvioudy_^.oa_ *^„p,c,e ^^.^ver to use as a basis for 

--^^tpoVar:"^ r„v„" 

0ngmal‘iyj ,hem ^say Again the judgment 

:tS'cr=d testing fotj-^'^rto recognize the unexpected and lo 
of the scorer is o®' 
credit it 



128 


Essay type Items 


Criteria and 
the Student 


Three Kinds of 
Scoring Scales 


It IS important to recognize that essay items arc intended to mcas 
ure discreet objectives that have been prespecified m a content 
Jr! students should be informed of these objectives 

von ma in goal oriented learning Objectives as 

y^u may recall include mention of criteria Consider the following 

* w" be able to 

eva eaOnn symbolism and defend that 

evaluation in an accurate complete and creative manner 

will brappSTo'thI 5bouId know the criteria that 

may goald^Lt them Jl! responses so that they 

set of judgments the ha« r being a private and mysterious 
criteria shouU becl^ihem", ■ <= the 

over performance feedback lo ^ ^'■bject for learning More 
a grade Students can learn from tP"*®'" ™orc than 

how their behavior fit the criteria '’ai^H°™°"‘^' “''P 

did not The result will undS.ni u '“Portantly how it 
students ,0 generate el^ remo ' 

m terms of scoring critena judged successful 

There are three Icmri.. 

ttg grades to essay responses"Thei'''" “"p"* 

nal scale and a nommalSe >nterval scale an ordi 

Interval Scale A 


-l'm^”o£i°F‘'“ “ "r‘“‘ 

together have a wm u. bown in Figure e , p ^ ' tind his or 
'he other Jo ^'“8 of thirty L com^ Pritena tal 



Crilerla for Scoring Essay Items 129 


A Sample Essay Response Scoring Sheet 


STUDENT_ 
DATE 


CRITERION 


WEIGHT 


POINTS 

POSSIBLE 

10 

10 


{JUDGE A) 
POINTS 
OBTAINED 


CONTENT 
ORGANIZATION 
PROCESS 
SOLUTION 
ACCURACY 
CONSISTENCY 
ORIGINALITY 
ARGUMENT 
ACCURACY 
CONSISTENCY 
ORIGINALITY 
TOTAL POINTS P^SSIBLE-SO 
TOTAL {AVERAGE} POINTS OB 

TQTri OBTAlNEg^inO, 

PERCENT SCORE- foTALPOlNTSPOSSIBL 


Figure 5 1 

TEST 

ITEM. 

(JUDGE B) (AVERAGE) 

POINTS POINTS 

OBTAINED OBTAINED 


COMMENTS r 

^n scoring hej J ““’S‘’.74“4nous pracess points) Your 
Limed each category ^ a Lndard 

rnte^oL^S each essay response The following 

rules'^mus^eestabhshjid^^ e,er> 



Essay type Items 


(2) read each essay once (Judge A), shuffle the order and hide 
jour first scores, and read each essay again (or have a second 
person read them— Judge B),^ 

(3) after you read an essay, use your best judgment (and your out 

Mn answer) to assign it a numerical score of 1 to 10 

(10 being the highest or best) on each criterion 

based''on“pCM!!r introducing a bias 

have a tendenrv i student performance Some teachers 

students higher 

more likely to produce The d™,b^ 

establish sconnv 1. 1 ^“"e in an effort to 

ovemle same ont *31) order to 

interval 1-10 scale response scoring The 

structure of the scorinn It'^ Judgment within the 

the teacher to make eteht md scoring sheet forces 

entire scoring outcome to be^f"''d " 

teores from the twruj!” ™ judgment The 

the averages summed m give a rn't^Ko averaged, 

'erted into a percentaee u 5^°^® total score con 

possible score and multiplying by into it the maximum 

guard against subjecung^studemr'!'* important to 

undisciplined sconng s/stem Tb! ® subjective 

and an outline or model resoonse”'? t scoring system 

liaeh (gue the student back’^a r “r detailed feed 

lietailed comments) *> 1 ==' 

“"'a“tl'ent,cpartofthe:toobntprm^^^^^^^^^ 

Ordinal Scale Th<» 

TfSh “sa”resnin”‘? substitutes overall 

bm '■HU assigning ^r7oSL cnterton 

order from k’ readina and then, 

■Uf-'icr xTictlt^ssT^'-S- - 

pile, the second besi v ■ ' - 


P'fH'thJ sc^Sn'd^ ““^’^^rr/tb e' ‘’’^d --uremerta 

"iird and so „ "'““Id be second “P ‘°P °f »’= 

up at the bolio7''"F"l’''' '“°'at essay res '’ust, 

Pu'lain of the pile Vou Sn '™“W and 

Ur If iwt) sets of ^ u H uote on a separate 

""J loarth or fulh ""Possible because of , 

f'liosen a. random) a second .Tme"'' 



Inter rater Reliability 131 


sheet of paper of the order or ranking ““I ^ 

the responses or have them read and ran e y awareness 

All judgments should be made hhnd t an.^^^^^^^^ 

of who wrote the ‘'“Ponse Each y 

average of the two ranks The la P percent 

grades which can be done by assigning the top so ma y p 
an A, the next so many percent a B, and so o 

Nominal Scale. If you wish to -d^up five^S-des m 

respLseand ‘hen a-.gn h to ™e ^ 

first choose to read all th y 3 „a assign 

idea of their relative quality and ^ ordinal 

each to the category judged to 0 „„ ,a 

approach, the nominal app™acn is s fo^ance rather than 
are applied in total as a basis for J P approach 

discreetly and .udgments should be made blind 

As with the other ,wo readers In each case 

and either done twice or ^ J j,„g which of the five cate 

the judgment is a «Wgor‘«' ,n_relative not to one another 
gones each essay respon established 

but to the scoring cri eri represents more work 

ommended that taacters us^ consistency 

tion that Its use will imp 


I h.litv will be covered m Chapter 

ay response ^ because the teacher is func 

t — as any j^fluence Ho'V - taken to insure 

faTa measuring or consistency or accuracy) or 

ning as a m ^ objectivity t , ^re recommended A 

= ITtrTo tCend. nttt for some of 

mdependent iudg™;reent This second judgment car 
bjectivity in the 


INTER RATER RELIABILITY 



132 Essay type Items 


^gure 5 2a 


made by the same teacher who made the first or by another person 
Using the average or mean of the two judgments increases the 
reliability of the scores 

It IS possible to compute the degree of inter rater reliability 
by determining the degree of correspondence or overlap between 
the two sets of scores* What is recommended is that (1) each 
ey ay be scored twice by the teacher or once each by two teachers, 
0 judgments or scorings be as independent of the 
whose teachers work on the blind, i e , not know 

renewed “ has already 

m 

tive essay questions twice becomes a prohibi 

When such is the'r ^ possible to find a second reader 

your scoring rehabihty some judgment about 


Example 

tind scoring appXr In F!^rer2?°F^^^ *®^'=her’s notation: 

e student his or her scoring rules By giving bad 


objectives and cntena ^ contains the teachei 

the student his or her essay^with theVeach”^ niles By giving back 
check marks, underhnines^ itern ^ s markings on it (i e 

mfomation shewn in Fili^re 5 "b “hf , ® 

feedback to be able to^enofv 'rU ^ have sufficient 

required and of 11,,. ■aemity the nature of the .nfnrmnonn 


^indthe Actual Aesponses 0 ° a Student Anthropology Course 
(t) Name and describe l„e ... 

e ngs that make them very adaptabirt'o tcatures Of humar 

'^'^t'a^ceecnbetbee <= P- : 

PHhecus genus (5 pis ) ® ^VP'cal man of the Australo- 

Of Uie teacher 

* Technically thi« w,-. 
cient between ^ ®««aplisherf ia 

JUdKd by eye *«* -t Wfe “ “e«laUon coeffi 

practical purposes it can be 



Inter-rater Reliability 133 


<Wje^cmcC 



V /m 

ifU'Cmu 


' Jo 






134 Essay type Items 


Figure 5 2b The Teacher's OblecUves (Includmg Criteria) and Scoring Rules for 
the Two Essay Questions Shown In Figure S 2a 

Question 1 Oblective The student will be able to name and describe at 
anaysis) leajj Important features that contribute to 

human adaptability to environment, drawn from 
among the following 

brain— leads to tools, shelter, creativity, etc 
stereoscopic vision — helps us see clearly 

eat meat or vegetables— helps us live more 
completely off the land 

delineate hand control— to make things 
warm-bloodedness— live in wide range of 
temperatures, climates, and humidities— 
lets us tolerate land and water 

social cooperativeness— helping one another 
to survive 


long period of caring for 
likelihood of survival 


young— increases 


be awarded for 
points covered up to five 


each of the above 


— 2 °---hes.u.n.wi,,Peab,e.odesor,be.^ 

S '"’=3'"^"°") 'h- daily life of a typf 

descriptions Adalralopithecus, his or hei 

^''"3 ""h the areas of 
setting food 
sleeping 
culture 


and emphasizing the 
routineness of life 
animal-Iike existence 
Pflmitiveness of culture 



points mentioned by the 
description, one point will 



Inter rater Reliability 135 


Box 5 1 


HOW ACCURATELY CAN ESSAY QUESTIONS BE SCORED7 

In a classic study ol the (1033) asked 

which have been English Departments at West Virginia 

his colleagues in the History an 9 administered to 

University to write essay „_an,„es course The history pro 

seventy-Iive students m the genera! humanities 

lessors wrote the following ^imafplv 400 words life in a medieval 

a security or the lack ol It 

e uall"s<'-knighthoodor.ndustry 

d recreations or diversions 

The English professors wrote the following gg,3g,aphs or short 

TstesTaryou have read jhe Cdnlertury Ta/esv 

? a -- 

c Show that ja/es individuals or 

O^Are't Characters in the nrotogue and the re 

merely lypee’ seventy-live essay responses 

TWO weeks '^'^^g'^Cated to “rnr,ha exceptional care 

these scorings professors c Hearee of agreement 

given to the prep , j,* percent o 

was ahPP' W P regardless “gg ,ersa depending on which 

rtC:d'.?:m P-- “,-rpercen.var,od between passing and 



136 Essay type Items 


A Final Point. We have now covered the two most common 
types of classroom test items, short answer and essay Reasons for 
use of es^y items should not be that (1) they are the easier to 
wite of the two types, and (2) fewer of them need to be written 
tssay Items should be chosen when and if the objectives to be 
types of processes best measured by essay 
tivifu^f ^.1 ^ teacher is serious enough about objec 

catppon use the kind of scoring outline, form, 

categories, and procedure described and illustrated m this chapter 


Additional Information Sources 

Wa"h Dc"'’r‘‘"“= 

Education 1971, Chap JO American Council on 

Solo'^n topro“ nVthe ^ 

^ Berg (Ed ) Evah^t social studies In H 

1“ ^u^'f ~ 



Self test of Proficiency 


d evaluation 


Self-lesI ol Proficiency 

(1) The left-hand column lists the four thinking processes measured by 
essay Items Match each cl these items with its particular charade 

ishc listed In the right hand column ^ 

a application ' le 

^ II applies criteria to judge an idea 

synthesis ^ ^ solution a method or a 

material 

,11 measures knowledge acquisition 
and comprehension 
Iv uses abstractions to solve con 
Crete problems 

V breaks down material into con 
slituent parts and determines the 
relationship of the parts 

the hicher cognitive processes should pro 

(2) Essay Items to measure ® student 
sent situations that arc nov^^ 

.e tost whether the student can apply knowl 

(3) Construct an essay item ,Psones to a specilic situation 

edge gained from a un 

(4) Construct an essay item show what charac- 

polentially conlusable anima P,„sr 

teristics they have m co 

(5) construct an essay „,aking a useful object out cl 

synthesis describe a P purpose 

materials not norma y , „pether the student can evaluate a 

(6) construct an essay 

solution to a polihca P ^ responses lor your class List 

(7) You are about to ae°ie scoring 

criteria that you wou represents a recommendation lor 

(8) Cheok each a'a'e™" f' ° 

-liable scoring of -ays 


Tread as quickly a= 

6 Sa'^scoring sheet 

c score separately by 

p TTd'nTmdethanhal. 

" the essay a. one hde 


e score blind 
f score twice 
g read between the lines 
h construct a sample response 



onincrivEs 


Chapter six /Scales 

and Procedures to Measure 

Affective Processes 


c™«n,c, sample affective 


goals for the class 


“imudes, nameW Likert of 

“tljccinc checkhsl^bm^/“ scale, 

nominations ’ scale, and 

s'atemcnts ™'cs for writing attitude 

nn atiiiudeS”' ouffine for use in constructing 

? "'“"“'■'"e -'cms, 

^ and nominal?™' 

for attitude 



measurement and the affective domain 


The alfectuc domain thinking or performing 

human functioning m feel, ng and thinking are inter 

aspects (although m ^e ,haf major portions of our 

related) Of course ue are tif our every 

lives and energy are ' or nro^voke feelings to some degree 

day tasks “tiv , ties include o P exclude the 

In other words, it would be extrem , and capac, 

area of feelings from our definUion 

ties that are important ■" df ' wdh “ ^^etive states is att. 
Perhaps the most widely tir negalwe e^a uatwns 

tudes Attitudes are jetton tendencies with respect 

emotional leelings. and P" j"* Ballachey, 1962) Thurstone 

to social objects (Ktech, Crmchh ^1^^ positive or negat ve 

(1946) has defined an attitud t people have atti 

affect ("e , feeling) associated '.hers, toward things, 

tudes toward themselves, '“'”^/„„udes definitely influence the 

“"Sk.. ..1 

nentssuchasattimd«)^^^^ “^'TSme aTflready existing 

teaching hke themselves, „[ goal attain 

(^er proper 

STndi'tmnsT patter of classroom 

'''Tk with f m the Che7and has typ, 

abdlty to work ^jerable of the subjective variety 

behavior) to evaluation am r 

cally been subj recently systems are currently 

Many school distr whether or n”*® may be interested 

mg affective behavmr_^^ f "'ITas ^mt oTtL ovendl ind. 

Aemeasnmment of these ,3pao, of the affective 

“dual student nss-sme^ „ ,ke next chapter 

domain will bed ,vas describe 


Che taxonomy of 


I vviit* 

affeenvedonnuo was described on pages 4,^3 


Ito 



140 Scales and Procedures 


Box 61 


SHOULD TEACHERS MEASURE STUDENTS' ATTiTUDES? 

whole of Ihe educalion of the 

Ho«ver education movement 

educatron has recently that affective or humanistic 

and lay alike vrpw classroom Many people, professional 

Children lnd°ai?reLtemb,''"'!K'T '“'"’''aWly recognise that all 
them to school each dav amf a attitudes, and values with 

education' lakes placed Wouldn'nt hri! ':, P"'"'®'’''®'' "affective 
provide teachers With fh<»« * better to acknowledge this and 

vnth a high degrel l ° “P' clacaticn 

happen by accident? Mien a chllH^ t""^, 'han to let It 

head In the sand and pretend nm i* Pf® hurt, we can't put our 

ether institutions in society to make°ch?M'^ "1° 
tng citaens aelHu||,||,n, alert consumers, parlicipat- 

hem learn about theiroOT Inner' wolf? 1;°'®'’*’°'' ''^'P 

rom time to time to measure some ^ 
be measurement can serve as a r^t ®o 

this time comes (and it has como? “““h child When 

knowing how to measure atlitude^’^' iTiore comfortable 


^utlom 


the affective d 

*^>2 purpose of this ch"^t was mai 

outco^ ®klns neceSarv ? the teach 

S'S “’>™' '•—ns wh 

adults tend f reactions tha^c^” measun 

“temen, off',^'°<>«neacttoorbe dll *"“T 

nonprotectivp r *^'Wren’s reaction about the mea 

““'■°ns ofien f?ir’““"""S children the! ‘ 
'‘“P ■" nnnrwll ‘etcher Hert "!*”^ ‘t>tik of exerc.s.i 

-''™-ngordesig„.„;S~;su^^^^^ 



Some Goals of Affective Education 141 


(n Use or admt existing measures where possible It is usually 

reasonable to assume. hat pti,..hed^^^ 

usm^he^^^ever regardless of source (Existing tests are 

described in Chapter 14 ) nn their papers if 

(2) D° not have the students writy cm 

your purposes can be served wiuiu 

identity to share with their 

— rrtrrr— demse 

(4) Do a briefing and a debriefing (before 

anxiety measure be aware of and sensitive 

(5) In constructing your ov „,asure 

to the fears and anxieties 

might allect them ..r^ms on their feelings about your 

---t:^r;/n^^-^.,eagues.norder.o 
ScIpT^eS-ctr 

Keep these potnts in mtnd as you read the ensuing pg 


SOME GOALS OF AFFECTIVE EDUCATION 


_ behavior and everything that 

’’dprfv or”®d.sorderIy f^shton ^ offect.ve 

”^^Do you as a teacher hav= 6 3 l,„„t techniques to ele 

lucat.o’^^JohnDevveyflS®^^^^ 

ate the school of “ f*'“ , ^ character an affective 

Ts= '■ '“ ” 

tate as one ot tne y 

ay that ij like school 

. the student shou W herself 

. the student should 

.the student sh to work 

. the student should 



142 Scales and Procedures 


• the student should be respectful to other people and property, 

• the student should be able to work independently without con 
slant supervision, 

• the student should be willing to help others, 

able to'^anriv't*^ similar goals, are reason 

gests thaf thw' *“room Moreover, Kohlberg (1973) sug 

‘ha? we want fo 

mg as classroom goals some of the follow 

toto" lust ‘h"' P=°P'= antitied 

• the student will behe"”'"' n*^ who or what they are 

• the studemClt’ ' ’’nrnan rights 

individuals ■" ^'Snity of persons as 

mh=r“s7o™!ouM tav“tr“^^^ “h"' " ‘’'>’^""8 “wnrd 

you would have them behave toward you) 

From ih^sc kinHc (• 

have evolved more specific OTes,TucffM'''^^ goals, many teachers 

• student wmbeIble°toth't'''“‘’" 
•-r^^wtllhehevethatrorTrcr^^ 

•‘f^rtrrhtX^^rr-nherwork 

^ of any sort *"= essential dignity of physical labor 

j;;3‘“’‘"“''-“'^~~-ard groups 

a'tty'att?m1it"'° haltavior Wlh* th?'tM h beliefs, these are 
doorcase ol£.r"®jj“’ ““ease certain heh''“^' ‘“Pboitly or expli 
oan 'nvolve^ceiam?^'’ ™nnipulaUo?rof°? T”'’ 
gers and ‘dangers be sen^ir.sr^, * i feelings and beliefs 

all stud^ntj !o^ that the situa^^ Possibility of dan 

evaluate behau ° establish an ethical requires Encour 

be broaden ' Aem sS 

follow thereun of thf» ethical principles 

as Vh *at *t beliefs that 

student ns mstmctori£/"‘“‘h“ ns measurcr-as 

^rs a responsibility to the 



Constructing Scales 143 



CONSTRUCTING SCALES 


,llv measured by the use of scales A 
Attitudes and beliefs are „„menc<./ umts that can be 

scale IS o continuum order to measure a 

US with a scale of order to measure such 

provides u w ,„on to ® values we have to con 

bodies o attitudes e subjectively assess the 

intangible «'“S ,^,ale that can be ujeu that 

Struct a nunter something » „i,vc,ral scales or of 

degree P^^^^elow lack the aUitudes and beliefs 

are While scales tha otherwise accessible 

cognitive sent s nation that >nj (We will con 

sider the issue of 




144 Scales and Procedures 


Likert Scale 


Two-point 

Scale’ 


^ 1 1 1 


strongly 

agree 


agree undecided disagree 


strongly 

disagree 


that most closelv^ll^" statement the student checks that option 

the scale and the statement m l, P ™ 

that the scale points have an ah’''l'' cannot assume 

«udents ) Consider the follow.nglampln“" °™ 

• This class was interesting SA A tl D SD 

• This class was bonng SA A U D 


N^ote that the line that 


the words have been repUcaJ hi* if been omitted and 

the student is instructed to circle letters for convenience 

Iter cpmion letter that best reflects h.s or 

Scoring a Lik^rf Ce. i 

''™ (e g this class °n a positive 

?r onanegau“.tem t. *=4 U=3 

that ! i^”^ b»3 D _4 SD=S aaa I”® was bonng) score 
exnp ^ ^ positive the aft scores to get a sum 

espenence being rated attitudes are toward the object or 

hemlbov? “"t '**^""0 'o/lT *’aa “ne point 

• This ells lllll,l;*'=W'''wtng"nem‘”* ”*** 

• This clasT TES NO 

for the sm 

'Ist'’”!'" '•“ponse to™ °° *" general the Likert 

'■'clUiir term will no. 

«"in Us use in adjec 



Constructing Scales 145 


Scale :s recommended for older students (high school college) 

and the two point scale for younger students 

To scorL two point scale give +1 for positive answers yes 
or true) to positive items and negative answers 
negative Items and -I for negative answers 
positive answers to negative items The resu 
the positiveness of the attitude 

The adjeetive checklist provides 

tives for descnbing or evaluating something and 

her to check those that apply For exampi 

interesting 

• This class was informative 

worthwhile 

Unless the adjectives chosen Complex tor 

explained in detail the adjec t approach are that it cuts 

young children The ° interpret and helps students to 

down on extra verbiage F feelings thus becom 

learn about the use of a J j process 

mg a useful part of , counting the number of adjectives 

Sconng is dccomplishecl positive evaluation and subtrac 
checked that are . j,ecnves indicative of a negative evalu 

mg from it the "““’f ■■ “pveness of the attitude 

anon The result IS the positivene 

llv a seven point scale linking an 
A bipolar adjective = '^“pLd to evaluate or describe a par 
adjective to its An example appears below 

ticular object or experienc^^^^^^^^^ 

boring 

good 

interesting • unpleasant 

bad^ 

pleasant — — „,nach was pioneered by Osgood 

measurement are t practice wo scored bj 

^‘“^Tbre pan^wi^i and . is .he 

labeling the spaces 


Adjective 

Checklist 


Bipolar 

Adjective 

Scale 



146 Scales and Procedures 


Nominations 

Procedure 


summ A th ' ’ 1 scores for each adjective pair are then 

‘ "=P'^“*'"g 'he positiveness of the attitude 
toward the object orexperience m question 

Lb mTasurrf!,rr‘*“';" “ f°™ of scaling- to 

events In this nror c, sju^gj^jj- feelings toward peers or 
mo?e thmgl “^'=‘=‘1 t° name one or 

of attitude Its most coSon ^ category 

ship patterns Oupcn^ne. i. measure liking or friend 

like best^ Who are ihp as Who are the three students you 

to each qu^ocn the In response 

classmates This informal wnle m the names of three 

friendship patenif eXj a" - “ 

assessing attitudes toward classeT^b"’f context of 

apply ® Classes, the following example would 

felt were the mosUnTeres'hng^'’'^ 

felt were the most borag''^* '“''cn so far that you 

onepom™crt'e*'sX'Xwteio‘‘°"' as follows a course gets 

Wiir tr?'’;"' f- ^^^^sSTlT “ ■"testing’ 

me n„ score for each CO, It as "most 

® "crainations minus the nXbe/ f "““'’cr of "interest 
relatwe ProcedX ’’"■"g" nominations 

Crete (smd '* has The well in making 

What « ntin^lrnlr 

-re .LTom fonSf Howe“r7h 

™'Bht not nTi f "“minations is riT c ™ “-r must make 
he or he "'""'c «>= same tSe "ti H “=™P'C' => ^‘rident 

™''d the sTudXf "> pCwTh''V"r ’,h 

--nating someo^fS™ what u TsXf hToTsfe 

‘oe frequency scale 



Writing Attitude Statements 147 


WRITING ATTITUDE STATEMENTS 


An attitude scale contains a senes of ^ 

which IS responded to by nsing ^ the 

statements about/" “’tject, two-point scale format 

test taker responds by using th ^xpenences themselves that 

Stimuli are the objects P®f®° ’ n aj,ective scale formats or, 

can be responded to by ° j^„,lh thi nomination procedure 

with slight adaptation, can b „ewhal more difficult to for 

Since (1) attitude f scales containing attitude 
mulate than stimuli, and ( ) concentrate our greater 

Edwards (1957) in his ittlZtLnoxi of atti 

stmction suggests criteria here along with some addi 

tude statements These win 
tional elaboration 

(1) Avoid Factual Statetnems ^ speculate or to 

An attitude statement should r q facts Fac 

nromet rather than to «P°^' 'e me™ urement and should be 
LT::;.ements Consider some examples 

excluded from affecti „rocrams require considerable 

. career educa.i- P^- 

developmental l/us education pro- 

. The price tag to ^^;;'j3,Pmed 

grams is too ^ ^ children when they misbehave 

‘ f of being punished by mj teacher 

(2) Avoid i!e/e«”/'“d‘be^vntten m the present or future 

A Statements should "c de measurement relates 

rmtht<banthepas.;e-^^^^^^ 

, the current stat below 

jects Consider ^^dcs when I wanted to 

, . I have always B to 

b“to ; ^cang-Thm^^^^^^ 

pou’' ■ got mad , ^p^d. m> mother will get mad 

better • 


Guidelines for 
Writing Attitude 
Statements 


poor 

better 

poor 

better 



148 Scales and Procedures 


Am 1'!“" *= past measure self reports of behaviors 

Se leMtl'Sf f behaviors may be 

of he st,X i"" 'bay do not reflect the attitudes 

prmect We t I 'bat the student 

on what has han”"^ onself into a situation rather than reporting 
of that judgmenniiM ” **'n* 'bo speculative nature 

dent’s attiWs a * '’™'' “ ® ”"‘^“’8 “ba' a stu 

An atulnA™'? interpretations 

K .r;rotkes mom r^' .nterpretable in only a single way 

know how to respond to «" that is"'^h’?*‘““’"' """ 

to Some examples may help clarifyTh.spomr'''''''™ '‘“P™'' 

’ Mv 'teach? bigh levels of performance 

pcrformweir"“* students to 

[ ' am my own worst enemy 

manTs«uatTonr>' «« -nyself in trouble in 

could be interpreted as represM^n? performance ' 

ation and support or a neeative * ^ Positive expression of motl 
tearing and thus not lead a m. 7 ? excessiveness and over 

h.s or her true to use the state 

Attitud'e ''■'■c'c'-anciej 

“bje?ra?e™?aA^^^ ">ward a specific 

AU of ;h"“’“<bke"fr.^™? 'bba a thing (like "a 

appear below ““““-O-'-on are irr^le^a'nVsLt^e^ImS? 


poor 

better 

poor 

better 


poor 

better 

poor 

better 


(or „de, to school isapleasr 

* The PnncipaT*,?* “^'OS '° going to scho 

• Tl'Ia?’P“".veof,e?ch?r's building 

<eachcrs“P“' 'b'^ school is very supportive 



Writing Attitude Statements 


Naturally, what is irrelevant on one instrument ™ h 

another depending on what the object of measurement is Cheek 

your Items againsf the object toward which you intend to measure 
attitudes in order to be sure they all are relevan 

( 5 ) Avoid Hondistmgiushers ^ 

Do not include statements that ev^ purpose of an atti 

or that no student is likely to ag . holding favorable atti 

tude scale is to d'Shngu.sh betw^n peopk ^ding ^ 
tudes and those holding unfavo ^ 5 ^ whatever on 

guish between various attitude positions are 
an attitude measure 


poor 

better 

poor 

better 


Tlould rather go to school than do anything 

of mv more pleasant experiences 
Th=°Sni‘ted Nat.ons’^ has an important respons. 
bihty in the world Nations makes me 

ScrSabUnh""umre of the world 


(6) Cover the Full Range ^ which 

ae Lt of attitude st^temems^ „,har than just 

1 object, person, or ,, -overage it is best to begin with a 

;e or two facets To “^Sin which the attitude will 

intent outline ‘bat b^j measuring attitudes of educa 

, measured Consider the k ,,ke 

s toward the open classroom 


philosophical appeal 
positive British results 
too much freedom 


^rndenthenehtstrcm^Cassroom 

maximal use of resouwK 

tummer costs to system 

:eTedrle\‘b®J“5n«1ew:^^^^ 

,/7Th: cement outline might include 

of Other students the total school climate 

effects of the teacher meeting self needs 

^ —interpersonal 
—instructional 



Scales and Procedures 


better 

poor 

better 


(7) Write Simply Clearly Directly 
Do not use big confusing words when they can be avoided Atti 
attitnd understand Remember 

shodd nm . to be measures of intelligence they 
Sailer “ “'’■‘“5' understand Below are 

poor • As a subject chemistry provokes my strong involve 
mem ^ 

* I like to study chemistry 
People s treatment of other people should reflect a 

^ reciprocal concept of justice 

* to be'mSimd* 

AttiioHl 

hong attitude'SatementT''* exceed twenty words 

utber rules than rr^htr™ ^ ^itt iel^ ^ 

* Uon VwSh''"^* ’’"“If in a situa 

Xroe^s,!^ n 'tilte advantage of an 

do so ^ ^^ti usually be expected to 

Basically people can t be trusted 

Hud^dtlfp::!,--'- -e are likely to 

tolerate different Inds of^b'eh 
PeoDle hav« rs *'• ot behavior 

pie have a right to do things their own way 

B an attuudrs°“i®^*'‘'''®'"to'ne>il 

know "rn Bifc-ence"' are tT (auch as people 

bctler . atodents'’fairto* subject matter and 

iZr : 

A am easy to hke ^ ^ easy to like 


better 

poor 


better 



Writing Attitude Statements 


If you write an Hem that “|g‘thougto In this way, 

items with each item containing one ot the tnoug ^ 

each Item will elicit only a single reaction from perso 

(10) ^voirf "Atl," ^"“^mbJguity^nto a statement 

These universal words often i ui ^j- unacceptable At best 

or else render it automatica y ^ ^ either confusion or 

these words add nothing, at worst they 

certainty 


poor 

better 

poor 

better 


I never met a person I didn't hl« 

Toft feel I'm acting m.her than being true to 

, 1 ,. universal words, try 
When you are tempted to use .. or "most " These 

0 substitute f°Ihc"'ting universal agreement or dis 

,ords avoid the pitfall of eliciting 

greement 

( 1 1) I/se "On/y," andTan*ohen be eliminated 

rhese words also introduce amb gu y ^^j the rule is that an 

^rn^that they must of qualifier word in 

entire attitude scale not con 

■very statement ,ljot people can 

, Organized raligmu 

express them faith that people can 

I Organized reiig 

express their ai trusted to be fair 

. Teachers “ trusted to be fair 

, Teachers can. be tn. 


poor 

better 


^“ur ^ would no. neces 

tily invahdat t j Its . j ,ts accept 

oe or t-fJf ‘°"„f„S .hisprohI=m 

■■just" helps also be applied to 



152 Scales and Procedures 


^Sure 6.1 


poor 

better 

poor 


better 


(12) Use Simple Sentences. 

^bought approach is also the simple sentence approach, 
tenrpc * with clauses attached and by avoiding sen- 

kent j statements can usually be 

kept simple, direct, and to a single point. 

' All other things being equal, a person's fate can be 
. 5=“=™'"':'* Ity '«>«' hard he or she works. 

^ Hard work insures a person's fate. 

education programs, particularly those 
™"":<!tt>te employment, are highly 
. f “trahle for the local school to olfer. 

nroOT should olfer vocational education 

programs 

Avoid Double Afegatives. 

serves only™ make''dM*'s'^**V' ® sentence 
Such sentences can often tnore difficult to understand, 

the negative words can be ^TOeht^of”'"’ since 
poor . No „ A otincelling each other. 

rights **"' respect student 

poor • No*mrrcoul“f°°’;“'’^‘^'"'“'^o"‘ rights 

good ='=0“‘ ttiysolf would not be 

ho'tor • loan only say good things about myself 

S 

( 4 | Avoid '"'o'Ptetations 

,, . O'O relevancies 

HO) Avoid °alf' 

(") Use ' only," . ""one." and "never " 

',3! “^“^Plosenteiloes™''''''' =’0 • sparingly 

(’3) Avoid double „ega„4 

*Ed\sards, 1957 



Constructing a Ukert and Two-Point Attitude Scale 153 


In summary, then, {see Figure 6 I) one 
attitude statements that are qu.te s™ple and 
to a single point, and that are not likely to be agreed with or tits 

statements should nresent in the group of re 

wide a range of attitudes as are present m m g 

spondents 

CONSTRUCTING A UKERT AND TVVO-POINT ATTITUDE 

The purpose of this anil Two po'^^ types of attitude 

used in constructing th crales is revised following its 

scales Typically this category of based on item 

first use to eliminate poor item 

analysis, which will be ofAe scale, the teacher or other 

Before starting or objective of the 

test designer must <*ccide toward which attitudes are to be 

attitude scale is, that is, „acher, chemistry, team teaching) 
measured (eg, self, *'=^0° ' ,jont.fied the attitude area 

The first step assumes that y 
to be measured 

kii c The topical or content out* 

(1) Preparing °oage i49 It is a delineation of the 

line was described bnefiy on ^ ,bo,, taken together, constl- 
attitude area into t°Pfl °t „do area to be measured Sup 

tute the various aspects of the measure high school 

pose for example, that end changing role of 

students’ attitudes toward * ^ dealing with male-female 

Tomen m conjunction with a p^mg^ ^ould serve as a use 

relationships The «>tegorws^ “ particular attitude 

ful outline of t°P‘“ 62) . 

(Item numbers refer ,,,„des toward the emerging and 

‘ SaTgTrroS”" (.terns 1,3.13) 

!;i 

y claim to e1““„, 8,14.17) 

(4) social ro'es t 15.18,21) 

(5) career stereotyP^^^IPj 

(6) in schooK^t (itonis 





154 Scales and Procedures 


provides speciHc targets or topics 
^ outline avoids writ- 

Ihe ^ f^uch as attitudes toward 

outline the women). The more detailed the 

m^:in iC2V appropriate attitude state- 

role were essentiair attitudes toward women's changing 

tudes toward varinn as being the collective result of atti- 

man'srole relative of a woman's role in relation to a 

interaction career •Upr of labor, education, social 

styles. ' ‘ ®ohool activities, clothes and 


attitude survey 

Msiruefions G$lo 

“S'M Wiih It, Cl, 51 . If you strongly ag,eo wifh ih ^T'' 

™l a test There are no “"cle D, and if von .1 alatement, circle SA; il you 

things Vou 01 wrong answers This s a 171 . 7 "® *' SD This Is 

-’'--res 

* D SD dishwanh*"" 'Ike 

tliohwashing and cleaning 

«\ P... 


,eadere,„3,,^„^„^^ SA a 0 SD 

SA A . '’:s:g’rcr^™~-n=s 

(Sltfsenl^ ^ ° 

SA . „ ’"-"nTe'hL^^^ 

--'"S t '.S'e^X'" ,a, The hovThould pay ,h 

SA A n 1®'' 'he cost cl a date 

“ A 0 SD 



Constructing a Likert and T^vo-Point Attitude Scale 155 


(9) Girls should wear dresses or skirts un 
less they are doing something that really 
requires them to wear slacks 

SA A D SD 


( 11 ) 


Parents should make as much effort to 
give their daughters as good an educa- 
tion as their sons 


SA A D SD 


(13) Girls are the weaker sex 

SA A D SD 


(15) Girls who want to study engineering 
not feminine 

SA A D SD 

(17) It's all ngmforag.rltoaakaboylora 

date 

SA A D SD 


(10) Boys should be allowed to study i 

economics or early childhood educatior 
if they want to 

SA A D SD 

(12) Colleges and universities should 
have quotas based on sex for accept 
students 

SA A D SD 

(14) A girl shouldn't destroy a boy’s ego by 
winning games like chess or golf 

SA A D SD 

(16) A boy who has his hair styled by 
men’s stylist must be a bit odd 

SA A D SD 

(18) H s belter for society if men and 
have clear-cut work roles and m 
know what careers they may or may 
not select in life 

SA A D SD 


„ „an.s .0 9e. -'-d whda (=0) " ‘ 
" :/r.n ccege, .he gjH ^o-dd ga,. 

rrrhfrin" - a d sd 

SA A D SD 

, Hv the Educalional Evaluation Group o. Highland Park NJ 
■h,s scale was f,ere with their permission 

1974 and IS repr 

ct^tcmcnls. Once the topical out 

-rte aS half of %‘Zp,c anrl halt rn a con or nega.rve 
irection with respe 



Scales and Procedures 


agre7revt™\ '’‘‘®t or strongly 

or misunderstjTift^”V or she is lazy or hostile or asleep 

Sion of beine omte f'^ "'"'I “"'’^5' the misleading impres 

over about Llf „f I'’™'''! >he topic If how 


smn f misleading impms*^ 

over about Uf of '°™rd the topic If how 

will just cancel theme l' SO *n each direction such students 
live Morever the channe^f nt* Positive and half nega 

-ntulat .0 con items-may he 

(positiveness ofTttitlidertoSd'' h^“ illustration 

relative superiority ThmP f ^ changing roles of women) is 
good negative statement and Positive statement and a 

the thirteen rules presented in Ihe °uu of each adhering to 

■ °/ofTL“tT ™ ‘ho'subtopfc of® eIaUvl“s"pL°^^^^ 

Randomly lo(ate’'posmvt'’a"d „’’egau ‘ together 

her to produce a good mix ® "'"h vospect to one 

Afteralltheitemshavebeen 
■ty i vo'!“ ''“■ant of eilhLT the Likert 

generarih'''''*™'^ R-v elemenla^ level of matur 

ondaiy srtool'™? '''■ l''^no appr^ch fs students 

•nformatinn or adults the T v ^^^onimended For sec 

llv mS 'f "f 'he LikJr, scale ot" 
responses will category off tli leave the mid 

“te 6 2 i^s a tfr “ una7om„ o\" "h''= u 

your format and^T “ -noddled ) Onr '""""'“■on m Fig 
your attitude « ^ritten your item-t chosen 

responses to imn ^ ^ ^^rocedures for re ready to try 

development 
■A.) you win w, -'t-' 



Constructing Adjective Attitude Scales 157 


The alltlude scale for measunng the posUtve^e^ 
tudes toward the emerging ro e o w , drawn, 

ous examples ot a content outline P A , approach The 
appears tn F.gure 6 2 as an ‘“tnle of "Amtude 

illustrative scale has been Women Scale" (al 

Survey" rather than The Egg o^ersensitize the students 
though that ts ns ‘oPf ™ °^^=Xlce thetr judgments 
to the scale s topic and there y 


A Sample 
Attitude Scale 


CONSTRUCTING-ADJECTIVE ATTITUDE SCALES 


J nf III a stimulus word or phrase 
An adjeettve scale ts “™P°^ J.st of adjectives by which to 
for the students to react to and 12J for 

react The list of suUs words or object to be 

different stimulus words, on y 

rated needs to be changed stimulus words or 

To construct ^ ,„uh appears m ^ ^ 

object first (A sample hstot st xannenbaum (1953) 

areas of meaning 


tudes in a Variety of Attitude Areas 

sample Stimu/lldl Sample Stimuli 


Attitude Area 

Self-appraisal 
School sentiment 
Speoilic experiences 


jurrenl events 
Educational programs 
Interests 
Other people 

ArtiVltieS — , — — 


School 

Todays field trip 
Helping others 
The U N 

Career education 
Baseball 
Classmates 
Doing homework 


This movie 


Hard work 
Organized labor 
Open classroom 


My group leader 
Watching TV _ 


Figure 6 3 


My teacher 
Today s speaker 
Independence 
Common cause 
Team teaching 
Birdwatching 
My friends 
Helping Mom 



158 Scales and Procedures 


activity {e.g„ active-passive). An 
appear in a *“'<= “f ihe Osgood et al. variety 

(and its scoring? of .V.*" •*’'= of an attitude scale 

These types of sU are%Xd 


Semantic 

Differential 


lengthy list of adjectives for (1953) provide a 

adjective pairs shown in P* (rom which the twelve 

five that Se arew^r"= °f ‘''e twelve, those 

objectives") are dirty-clean” "f’ohavioral 

. good bad, unpleasant-pleasant, 



appeal to Marvin 



Constructing Adjective Attitude Scales 159 


A Sample Semantic Differential 

Behavioral Obieciives 


(1) 

( 2 ) 

(3) 

(4) 

(5) 

(6) 

(7) 

( 8 ) 
(9) 

( 10 ) 

( 11 ) 

( 12 ) 


dirty 

SHARP 

GOOD 

STRONG 

rugged 

UNPLEASANT 

HONEST 


passive 


BEAUTIFUL 

LIGHT 

LARGE 

SLOW 



Figure 6 4 


CLEAN 

DULL 

BAD 

WEAK 

DELICATE 

PLEASANT 

DISHONtc 

ACTIVE 

UGLY 

HEAVY 

SMALL 

FAST 


Scoring , . __ av-niem l+dem 6) 

Evaluat,on-(.tem 3+ltem 7+ 

P0tency-(item4^.,tem5+.w1) ..am 1Q 

Activily-»lem2-((tem8+item12) 

e 1 noiv Potency is measured by 
honest-diahonest [“g^fheavy and large-small acM 

strong-weak rugged-del and slow-fast 

by sharp-dull P^ss^ .n F.gure 6 4 that some 

You will note on the „„h the positive end on the 

of the adjectives have been writte 
?ett some with the positive end onjhe 

part of positive and negativ English those adjec 

Scale SiLe people read from f^ considered positive 

'lit are considered negative 'em* 

the Likert scale an app«x» ^ f 

of P-‘“^“;laTife ilenis can both be sc^edj 2 

positive Items to arri 

Figures 6 4 and a.ffermtial was explained by having 

E»“L?atresaUbu..s4^^^^^ 



160 Scales and Procedures 


pairs'^men'" fn'" differential, you should ( I ) use the adjective 

and TannenbauilfJ'lKS) ta^oJ m 

smde Dair rtr f,t,^ ' II ' ”**'.P*^ make up your own pairs A 
a dozen adjective Da‘iil”is‘/“"^“„'" ‘‘dequate reliability. 

The seniantir*rt ff S“^'ly a good number to work with 

aasessmV:“’?ru‘re'of ^hff /"'r 

shown m Figure 6 4 oftPn a general adjectives 

reducing the tendencv desirable response thereby 

mended'way It IS a recom 

with their students' attitude tbemselves in frequent contact 
for feedback purposes ^ variety of issues, particularly 

that IS used for descnmioIT^Tf semantic differential, one 

colleagues or other obLrvers i, behavior by students, 
pnmanly to supply the ^ P“cpose of this descnption is 

been specifically chosen to fit the f ^”‘1'’^'^'' '■'he adjectives have 
” 'be sconng instructions* (Fi^re fi'sbf clusters shown 


Figuta 6Sa 


ruckman Teacher F( 



* ActualN a HARSH 



ConsInicMn; Adjective Attitude Scales I6I 


(9) 

UNFAIR 

fair 

(10) 

CAPRICIOUS 

PURPOSEFUL 

(11) 

CAUTIOUS 

EXPERIMENTING 

(12) 

DISORGANIZED . 

ORGANIZED 

(13) 

UNFRIENDLY 

SOCIABLE 

(14) 

RESOURCEFUL 

UNCERTAIN 

(15) 

RESERVED 

OUTSPOKEN 

(16) 

IMAGINATIVE 

EXACTING 

(17) 

ERRATIC ^ 

SYSTEMATIC 

(16) 

AGGRESSIVE 

PASSIVE 

(19) 

(20) 
(21) 
(22) 

(23) 

(24) 

(25) 

(26) 
f07\ 

ACCEPTING (people) 

CRITICAL 

QUIET 

BUBBLY 

OUTGOING 

WITHDRAWN 

IN CONTROI — 

ON THF Rl IN 

FLIGHTY — 

CONRCIFNTinn.R 

DOMINANT — — 

SUBMISSIVE 

OBSERVANT 

PREOCCUPIED 

IMTonilCOTPri 

EXTRnvpRirn 

iiii 1 nw V wn 1 ■ — - " ■ 

ASSERTIVE 

SOFT SPOKEN 

(28) 

TIMID 

ADVENTUROUS 


♦Copyright 1971 by Bruce W Tuckman 


Tuckman Teacher Feedback Summary Sheet Rgure 6 5b 

The scores on those Hems are written with the negative poie on 
left are subtracted from those with the positive pole on the left To avo 
having the resultmg score for a factor be a negative number — which wi 
happen if the sum of scores to be subtracted (on the right) exceeds 
sum from which they are subtracted (on the left ) — a number (either 18 
26) that IS greater by one than the maximum possible negative score i 
added 

(1) (tern Scoring 

a Under the last set of dashes on the sheet of 28 items write 
numbers 7 6 5 4 3 2 1 This will give a number value to <. 
of the seven spaces between the 28 pairs of adjectives 
b Determine the number value for the first pair orig! 
tionaf Write it into the formula given below on the 
line under Hem 1 For example place an X on the first 
next to original fn item 1 then write the number 7 on the 
under item 1 in ihe summary formula on the next page 



162 Scales and Procedures 


"The Adjective 
Checklist 


^9Ure 6 6 


' Su’eL.r.re/lX'’ '’™= 

malj tomllla °' ‘’'"’^"=''>"8 In Ihe sum- 

f cZv,^™"'" °'™ns,ons 

ltcm(1+5+7+16)-(6+li+2s)+i8 

(| + l+| + |)-(|+|+|) + ,g. 

Dynamism (dominance and energy) 

+24.27)-(,5+2o4S + 16 

" W=™thandtoepL'nce' ’426.____ 

om ( 2 ''- 8 + 19 )-( 3 + 4 + 9 ^., 3 j^jg 

'I*|4|)-(I+| + |7|)+26. 

^ •“ t= applied 

;»ward oJXat Tv'L!.an''’r‘c" person s attitudes 

rabilitv!^f''^'\4« used and the student ^ '>olh positive and nega 
me thns adjective to the obnJt* “PP'* 

tive iiema -- ■" the oppos;^ d"rtrf'r Set^f 

^>'0 1 am Farm 

F°r™ample"'aJ' '"'“'ds can be 

am a mean person" 70”^' °' Vuu are^aV° describe people 

bow well each worded aaoh wort one Pa^Od" or I 

doooribes you Wu 70101 ( 0^ ,° ^ ’'me 7 h,nk about 

"'= wor cLsmaieTi?” CUsV^a^L'’'’'" *" d^db word 

space next lo It o u.,’’’’®!’ "’d" " Ms you do n a word 

same degree make*a 0^"'' ® '"°'d tits you and' '’"'"brng in the 
bs"-IPanyo„,, 5 ;d P «k In the spaol If youT k 
'« mate two checks m Ihl spLe 



Constnictlng a Nominations Form 163 


Hero arc some examples 

My classmates are smarter than t am (no check) 
I am as smart ns my classmates (one check) 

I am smarter than my classmates (two <=''“'«) 
My classmates are meaner than l am (no check) 
I am as mean as my classmates 
I am meaner than my classmates (two checks) 


smart 

happy 

mean 

(riendly^ 

jealous 

fun 

nnlsy - — 

likeable, 

troubIesome_ 
helpful 


nice*looklng_ 

annoying 

neat 

moody 

talkative 

pushy — — ■ 

bad 

nice..- 

popular__ — 

lively— 


impolite 

canng^ 

generous 

snobbish 

honest 

responsible_ 

lazy 

sneaky 

important — 
witty. — 


CONSTRUCTING A NOMINATIONS 


. r„r measunng students' attitudes is the 
Our last type of .nstmm="' f^ each student names or 

nominations form, =>" ,^achers, subject matters, events md 
nominates fellow students eertarn categories This 

so forth, that are perceiie determining popularity 

rroSure ts most m conjunct.on with a techmque 

friendship, or hking ,s ,he measurement of classroom 

Llled sociometry Nations of likmg preferences Sample 

social patterns based on nominations procedure appear 

questions for use 
Figure 6 7 

My three best eat lunch wrth are 

The three people students are 




Figure 6 7 



164 Scales and Procedures 


The three classmetee I would most (least) hke to br.ng horre are 
The three teachers I like best (least) are 

The three most (, east) liked teachers this school are 
he -hree things we did m this course that Hiked best (least) are 

dents for classmates is'i'l”usti^/°'^c’^^'“""® preferences of stu 

hy counting the number of time b S This form is scored 

were asked to name most ch,^ nominee is named Students 
to identify possible social star^ T nhosen in order 

PTeferences or likmg rather than ’® m^sur 

B rainer than avoidance or dislike 

'''9“'= 6 8 ^ Sample Nommations Form t u 

^ orm lor Measuring Prelerenoes 

°hTn“"'’ prwn'imr'thrna ^ 90851100 as 

like to mvde^'''"^ “ ’’“'T 

/ or girls you would most 

scale ~~~ ' 

of »' "problems', hal' “"'^''nsr'xr st’n'TT" '^f^ssroom 

'=tePter, .„clud,ng a briefing 



Using an Attitude Scale 165 


Although a teacher cannot give students Yen' much informa 
tton tn advance about the purposes of an 

test without unduly influencing their response a ‘=^er should 
inform students of the safeguards that 

.i_ 'Tinar, tlie test has been completed, tne teacner 

=;.-i SS'i- sr-i * — * 

'“’'some of the purposes for which a teacher might construct and 
use an attitude scale are the following 
0, TO help students acquire more 

' std'e';" Sed .0 more clearly see their own feelings 

about people and events experience If the teacher 

(2) As part of an aftecti attitudes it 

mfy'clVfor ammd" measu'^ement as par. of its procedure or 

in order to evaluate Its result^^^^^^ 5,^ 

(3) To P™''‘d« elarify their feelings about an event 

dents may be help polmcal occurrence or a specific 

(for example a na completing an attitude 

school even, ^ ‘.er'her report the total class 

instrument and having 

results feedback for self improvement 

(4) To provide the , j,boul specific events or mstruc 

Knowing how student f improve them or con 

tional experiences 

sider dropping them f interpersonal dec. 

(5) To gam information upon ^jimation of the whole 

sions As a person atL about friendship pat 

child the teacher ^ , sentiment highly useful in guid 

terns or self concept or * ^ mne^er may be 

mg his or her ^mdents need greater support and 

helped to disun''^'' j of themselves f" m 

encouragement to d"' ^ ,^ 0 , ,his last use is probably 

stance (H must 11= P The teacher would do well to 

he most dcli“*= “' legist or seek add.Uonal training 

'harfn onth-ventum) 

before ernbarKing 



166 Scales and Procedures 


‘ '"‘"''"f" “"‘I cmoiions manifc^lcd 

inton .n'T 'T.' to'^rcl .he school sc.inn and ,hc,r ab.hn lo 

— n. of 


Additional Information Sources 

Appl,o^„cttI^''STl 9 S 7 

Lake D G Miles M n 4 naric R n li 

3'ool5/ori;ieQsj.,.,„„, , ' bthmior 

lege Press 1973 ' N k Teachers Col 

(2nded) Nk David*MT*^**e^' ‘Icsigrt anj social nteasiiremeti: 
Robinson J P etal Vfll 

(iOiiel cliaraclenslics Am oeeiipe 

RobrnN'^p-'riv'"'^" 

Inslilule for SociarRelelrc'h Arbor, Mich • 

Robinson 3 P 4 Shascr P R ’''"'"Ean 1968 

'Oder Ann Arbor Mich Insliiuirr" e^ '’'kcliofoticol altl 
of Michigan 1969 ' ' ®“nal Research, Uniicrsits 

boaw M E 4 Wright J M c , 

NY McGtats Hill 1967 ™ '’’casiiremciil o/ amtudes 



Self test of Proficiency 


Selt-lesI ol Proficiency 

(1) Write two goals that deal with students attitudes toward school 

(2) Write two goals that deal with students interpersonal behavior in 
school 

(3) Match the sample item at the right with the type ol attitude scale it is 

illustrative ol at the tell ^ 

Yes No 

j, Math 

exciting tedious 

ill Name*your favorite subject 


Likert Scale 
b Adjective checklist 
c Bipolar adjective scale 
d Two point scale 


,v Of all my courses I like math 
best 

SA A U D SD 

V Math IS ^ exciting 

tedious 

unpleasant 

. Ihat uses the bipolar adieclive approach to measure 

(4) The instrument that uses i 

attitudes IS called the ' 

„eetow IS a, isto, sentences CheoK those that are rules , or 

3„„udes.atements^t^r,ncetothepas. 

h Avoid nondislinguishers 

^^'-'crotrard^ectty 

d Wd'd ” thoughts per statement 

r W^nTossible use compound sentences 

g cover the full range 

„ Write short statements 

(6) consider the (ollcwing attitude stetem^^^^^ 

a Classrooms m 

physical walls ,ake and my teacher teaches 

“ tt :r '^^l e What IS »rong with each statement and Why 

-Sul-::::-. 



168 Scales and Procedures 


'“ard your 

M o' 'Ms atlilude area aboul 

wjiich you could ask questions 

® a"?' "Megones (each c( 

a.l..udes,oward.heore7FrEfDTRlPS 

to^measlre* 00501'“'" 'Mss negalive, 

TRIPS Use the toolll" n '“'"'S"' "i® Pl=|dP' HELD 

positive and one neoative"! 8 and write one 
one negative item per aspect or outline category 

items''l‘d'roTrnatlonsTueT' “dieotive checklist 

altitudes toward the oblect F|1 D toTI" 

line from item 8 TRIPS Again use the topical out- 

« '0 provide student 

0 to ml "'«> '«glok 

0 'P lin^M l“em"‘ ' “"“"‘"y '0 others 

= 'P hPIP atPPont gam 1^:1' 

(12) You have been asked to i. h 

garding the use ot atlective'mlsil"'"'"'' ® '““"T meeting re- 
0 me 01 your presentation to inri"H olassroom Prepare an 

"PCS necessary In testing ,0 111 , 0 ™^" a statement ol the precau- 



objectives 


chapter seven /Checklists 
3ncl Scales to Measure 
Performance and Behavior 


„t=r,a f„. choosing 


IW QO 


t'l-iiuiiiiance 

and test- 
es and produa OTena''’“''’“ containing proc 
cvaluSe™'^ describe classroom behaviors to be 
; '"‘’“■■"'’^^-ratingscale 
r-^-orbehanormeasiimmentin the Cass- 



WHY MEASURE PERFORMANCE AND BEHA . 

Although some people see the teacher’s role m measurement hmited 
to paper and pencil testing administered on a " 

frequently say that they do no. r^y on su^h 

for student assessment In addition, in y 

formance and behavior on ajayt^^ay^^^^ 

rnurouTsSS^^idences may he mom -1 .h- 

test performance for both the ° ^ „ays of measur 

Thus, the need arises to provide 

mg that can be aPP'"^^ , ’ja„a„, evidence of student capabil 

behavior so that urement must be designed without 

ities can be utilized Such m which good measurement 

totally compromising the , procedure be appropriate, 

IS based, namely that bt Par. Ill) The proce 

valid, reliable ■''“'PJ'‘‘^’’c'’p^er are aimed at making the assess 
dures described m and affective behavior possible 

ment of ccgmuve " ueria ■ 

within the limits of these five criteri 


MEASURING PERFORMAN 


f„r the measurement of performance is 
The prototypic s‘“ation for tn ^ a task, 

an individualized testing .dentify a malfunction to 

that task being to '‘’''''element a decision or solution Typically, 
make a decision or to ®P'7“i either solutions (that, m some 
Performances are associated While the student 

rases are products) °‘'’“P -aer to achieve a solution or imple 
P on;P.tmg in a hands on mame ^^ho use a 

^ ♦ ran he or she JS ohs cteo in the performance In a 

T"kliP to record or evaluate oa tj^r judge (The 

S sefung of oon-, .heJ^ftL. .he performance is being 
student may or may , 

observed, but usually ho 

. The lem. TOifo-SiS 

,ty orientations anu 


17! 



172 Checklists and Scales 


Measuring Skills 
and 

Competencies 


problem and/or'" “ student is asked to solve a 

Lctionml^^^^^^ " Where IS the mal 

ing order of size^witlin.,# ^ washers on a peg in decreas 

smaller one c Of th,c Pitting a larger washer on lop of a 

-■ec.eda“heeorrJe,„n:”rvS^ 

way to get across the island sl. "'°“W be the fastest and safest 
Were eight of vou anH assuming that there 

camedle ActC a sk ^bTu " "> >’= 

S™tt Key wrote 1 sV^r e “ <h'= reason why Francis 

individual student or prm. In each instance the 

order to produce a soIuiion^F^ students is asked to perform in 
or steps undertaken to achieve'^fh^^*'?" focuses on the procedures 
py "f .he solution ■ Le, us “‘i™ as on the qual 

more detail ^ consider some specific illustrations in 

student unit be abl'e “"uMTLt^”^ '’T“ business data the 
•’“"'b 'hese data onto cards Pf“sramnied control 

uroected errors while followioE the with no un 

To evaluate a„ P^oedure 

puth^^'d g.ve'’ra°sl!::r f d to^t was 

data i?"* '"‘‘“'otedatTke^ “ti a deck of key 

and then° teacher observed punch the 

ated in lemToT'* ‘il"' ''"'.‘bed product" 'd"" P=rf°""once 

the time, ti, ^ ‘t the number of p.-,. i ® product was evalu 

<•- waVet;™:r“'-.> ol'p^rifThe ' 

(described by Tu' v °t >ho checklist 

Brown (19701 '®*T) ^'■“wn in Figure 7 1 

o?ie™rH“ de:T?”'f~?p?o''S^^ valuation of 

T-f°™ooces vabdaTedty" mX^of t 

* the perform ^ these occupa 

“ rtudent is eialuated 



Measuring Performance 173 


Checklist lor Evaluating Key Punch 


Performance m Data Processing * Rgure 7.1 


(1) student was capable ol actrvalmg all three 
of the proper functional control switches at 

the proper times 

(2) Student used the c'SrecI procedure lor 

punching nu meric data — 

(3) student used thecorrecTprocidure lor 

punching alphabetic data 

(4) Student correctly marsed up the given data 
sheet to indicate column punching 

designations Held sires and program 

card lay out ^ — 

card from his or her own designations 

and any given punching and venlying 

instructions 

card on the dr um 

pl“"ot'i=X)Ih-^ 

smoot h manner 

r.ons and developed ^ISs "apP^lad 

SrX' last t'™ rtte measurement of performance 

The”Sdents were “ 'jj^ed m Equipment to locate 

wi* a fault or Lt adm.mstiBtor evaluated not only 

the fault was availao 



174 Checklists and Scales 


m^tmcon sheet 

^™'™"'>ff Perhrm^nce m Commercial Food 


E«per,e„cee .Inexperienced 

WE SEQUENCE 

"Sx'n"'® “P™ 

^Made'‘2',M"“'’'"^ ^PPPlaPles 
e Italian dressing first 

"pxt 

n-r^UAN DSESS^r" '°''‘ 

J’lacea ares.’’™”*' '"8'PPi«nts 

■ -Mrxea the *1? ''’ '®'"9P'ator 

-Used . Pplore esmg 

J^Pareasonetrie am™, 

,^Sss7„g a'lP' making 

'«EAUos„c„ea, 

"a™«d P'PvIded 

^:'9--pnrmir'^“pp'"8 

'—^'0 not Chnea 

""al'chunke'^ 'xPassively targe nr 

'^"''PpS’oT™'' 

'^“’'sred excess ''“aPapP's Icvet 
Cloth P'Pcns »,ih namp 

"'"'“-"'PPe.cess greens 


Pilot student . Vocational Student 

THE SALADS (Continued) 

Wedged tomatoes or chopped them 

^Placed tomatoes on top of salad 

^Cleaned up station 

tuna salad 

• Used commercial opener correctly 
to open can of tuna 
Drained can of tuna 

^Broke up tuna before adding mayon- 
naise 

.Diced celery uniformly 

Used appropriate celery size 
^used less than half as much celery 
as tuna 

— *'*jngredients were thoroughly mixed 
adding mayonnaise as needed 

product was 

shanp enough to retain 

shape when formed 

tagerMer''*'' PPicPreP and re 

_;9-p^;rnrr ^ 

— SelecTe°d"tre orf" 

those provided 9=t"lshes from 

<=Srup"LToV' 



Measuring Performance 175 


SOLE FISH PLATTER 

Used a six to eight ounce portion 

Cooked fish to a golden brown crisp 

appearance 

Removed fat by placing fish in a pan 

lined with a towel 

Wedged lemon 

^Garnished with tartar sauce and 

lemon wedge 

Final appearance of the dish was 

acceptable 

Cleaned up station 


CHOPPED SIRLOIN 

Started with clean utensils 

Used a six to eight ounce port 

Formed a smooth oblong patty 

Cooked the meat to the right ' i 

of doneness 

Started the meat before the fish l 

omelette 

Cleaned up station 


* Reproduced from Brown 1970 by pemussron of the author 


in^trucuons for ^cfmrrtraforrrtg a Porformance Tasf ir, Electron, os 



Purpose ^ Select and operate test equipment 

To measure the ="";'V°jrver! (2) Develop artd apply a leg.cel 

associated with the AN/FRC-109 aN/FRC- 109 receiver 

procedure in locating malfunctions in the an 

Equipment facility and Simpson 260 VOM Hi Band 

The AN/FRC-109 receiver S,er,a 128A Oscilloscope Tek- 

test set, Frequency Selective Vtdtmete matching trans- 

be required to select an a You will be 


Figure 7.3 



176 Checklists and Scales 


Rgure 7 4 


to 1 IT". 

e Irv^Ll P)Ltotaato Of the faults 

lem, talking W.II not be permitted during the test 

Scoring 

ni c™ptrg''rrrecuy pSmT'^' "' 

(2) Completing correctly problem 2 ^ 

(3) Selection and use of ® points 

WTrcuhleshcchonr^ltor;''"'’"’"'’ 

5 points 

»Pncctn?™a?^emt,S'^A™""“ '^‘■™ fro™ 

filing vvrth u„„s of forrC Perfonnance test in sc.ence 

mended for fourth gradeS thf /j’" ^“*1" test 

'='gl»sofknotynfor« rnd eranh “ S.ven a spring scale, 
the spnng He or she is then aS f’a"' =«'tbrate 

=n unknoun force using the cal S *«PPnitne the magnitude of 

male;::, 'he perfo™„ce 1°" ■" 

(2) ■nsSm^s r"‘ f') t, Irst of the 

lions in I mid what t ^ administrator in 

IOe|.ctl„,| , “"'TS OF FORCE 

'-It Provide the child with 

weighs one newrton a sonnn “°"’®'"ers, each of which 
~ en a tr^Tg he has not seen before 
wrm centimeter tape on o^».^ Tnpod Spring 

Comfi'^P*' P^P't Tell h|~ “ pencil and 

Weighs one °p these 

THAT°^’’" paper to CAnn^-?^ 'PHEM AND 

TO MF»°'^ ttSE the SPRING SO 

TO measure forcfc stretch of the spring 

neSr” 'PMesk^o^ “ 

PP'Pt correc,,,. ene chick i^th P'°'= 

eF in the acceptable col- 



Measuring Performance 


umn for task two if he plots two points correctly, and 
one check in the acceptable column for task three if he 
plots three or more points correctly 


(Oblecllves) 4 Using your hand P"" ‘ ' 'irhim 

to some length within the range ol calibration TO him 
MEASUHE THE FORCE THAT I AM EXERTING ON THE 
SPRING WITH MY HAND DRAW AN ARROW ON YOUR 
IrAPH TO SHOW ME WHERE YOU ARE READING 
de force and tell WHAT the reading is One 
ch^eck should be given m the acceptable column for 
task lour if the child indicates the correct point on the 
graph with an arrow and states the ""ensure of the force 
^ ^ if hfi merely gives a value (say 3 5) it is 

liable to prompt with the question 3 6 whatv Allow 
an error ol 0 2 newtons 

• Adapted Fro” ihrAdroSe^roTsae?ce'’^pu\l.sM by 

&n^?n'rc“om^rn“yTx"rox corporation, 


L.rhuilt oerformance test m mathematics 
Figure 7 5 IS a “ fd argue that all testing m mathe 

n the fifth grade le'’®' math related behavior is essen 

latics IS performance te ^ Even where multiple choice 

lally of a problem solvi ^ behavior must 

:ems are used in math Ve mo^t useful distinction may be 

ccur to make a correct choice short answer estmg 

hat m performance testing process as well as or 

he tester gams access to the p ^ ,be student is 

0 addition to ‘he proWe and make a determine 

,sked to measure an angl , by reflected light 

on of on angle that would be p b^ an 

cf tt."round 

form) P“^f ® pnie limit of thirty minutes 

well as content T 



178 Checklists and Scales 


Figure 7 5 


<hat Separate thf [est .Ilustrates the less than exact hne 

I'ke a?esTav tis, Th.s test seems much 

pencil or typewriter The"mi materials are paper and 

and a perfcrmance tes^ isTw an essay test 

test IS an actual product ratheVJlf ^ performance 

proauct rather than a set of thoughts or ideas 


pte Perlormance Test m Mathematics lor Filth Graders 



as snowo otaw a „„e show 
»>« angle m , "'met an£ 

'la angle ol "eTeoZn T 

"°'’""»'lhe latte, t, (' pol 

migh, 5 " P”' 

f-'r 

''•nehing and learning may hate th"^ ’’’p P'mfo™' 



Deciding on a Performance Test 179 


DECIDING ON A PERFORMANCE TEST 


A performance test is usually given on an individual basis, hence 
n^s a time consuming form of testing Before engaging in per- 
torLnce testing, the teacher should examine these criteria 

m Real Performance Requires a Hands on Situation Cer 

(1) Real r student can handle actual 

tain performances can occur o ^^ abstracting the perform 

materials or equipment In essential validity, 

ance on ^ ,^4 L" .^r than your objective 

W^hera:“bfrrequ^^^^^^ 

an“ whTcrpTer and pencil are .be equipment, as in letter 
writing) 

a o 1 prn^ess Is Essential. Group paper- 

(2) Access to the teacher from having access 

and pencil testing P „i,icb the outcome or product was 
to the process or ^ 3 student solves a problem or 

arrived at In order t° 3 solution, the teacher must 

makes a product ™P^J , 1 , 3 , student If the process is itself 
observe the , and evaluation as the product, then a 

as worthy of allow the teacher to witness and 

performance test is 

judge the process ^ ™ 

„ or Product Has a Material Form. 

(3) The Final 0“'™™ produce decisions or ideas These 

Often we try to get papier and pencil tests, but the situa 

can usually be ,L final outcome of the test has a 

tion IS completely different something is not 

„amrial form Asking a pr her make it If your 

SX a valid -bs..‘“-,|:-„Se of process and skill in per 
purpose "{Ifa^ntance test is needed 

formance. Psychomotor Learning (le., 

(4) You Are Trying ° and affect (what you 

Skills). While cognition (wh ^ aadpupcil tests, skill acqui 
feel) can often b= ™ca^^ ^ u.us. physically do 

Hen” a performance les. is required 
tbcf H Individual Ellectlvencss in a 

(5) the PntFf^ of ” 

Group Setting- 


Essential 

Considerations 

(Criteria) 



180 Checklists and Scales 


'constructing 


Box 71 


Po™an?e1esL™°s''nredt“d (ThT'‘ 

person in a u (That is you have to observe the 

order to"e=iow ^L'\'T^^ '"'‘5' contrived m 

a problem ) A person niiJht"hT^r 

ually or to describe in rvr.? ,1 ^ j ^ ^ ^ problem individ 

ble of actually workinu in desired behavior but be mcapa 

test taker to demonstrat Performance tests require the 

demonstrate performance rather than describe it 

tion You'*caruI^l”Derfn*^'°*’*'^' Understandmg by Its Applica 
but only in terms of the measure understanding 

The outcome is still a concrw ^ °f understanding 

well be a rellection of certain ^ construction may 

formances like riding a bike s Processes {Habitual per 

'“"s=r reflect cognISn ) automatic, and hence no 

A performance test 

A performance test ne v 

loache, and invokes the studTm "'“■'‘•'d by the 

In “"*>■'-03 It IS usuair under a con 

armed at I, d^""" ''‘"ah the nrodn , P™“as-the proc 

'> does in all instances the^ a f Produced or 
” then deal with the ability of 


pass/faili 


- — • • MIL.' 

Ifa neo who shows up | 8 000-10000 leet"’Tne'’n''‘’ 'hoTOobes" 

" ^ - “h,ec,i™^:rpl'’r"™'’- Wbe,e ,be or, 



Constructing a Performance Test 181 


the student to do something rather than simply to know some 
thing identify something or describe sometlung and it usually 

listed and explained below 

specifying Des.r^d Perfo™ance^^^^^^^^^^^^ 

in constructing a perform obiectives stale actions by the 

ance objectives S‘nce perf lately include criteria for 

learner under given condi Hesited performance outcomes 

evaluatmn), they represent s teachers should focus on those 

For performance demonstrate and construct be 

objectives 'hat use such verbs ^ 

cause these \erbs sp y objectives do not exist the 

r^rslT/rrurtht Bdow are examples of performance 

°'^^rrmonstrate a procedure for measuring the volume of a 

. Demonstrate a about 

. Construct a poem that desc^_^^ 

some aspect or “SPeu 3 

. Demonstrate a proceu^^^ for conducting an interview 
. Demonstrate a pro „ jndian village 

. Cons.ructamodel o u„ 

. Demonstrate the abuiiy 

twenty out of cardboard 

. Construct a cuoi 

. Demonstrate a procedur^^_^^^__^ ,^3^33 3„d 

• Construct a disp y 

their characteristics 

(2) Specifying ‘If mptt to prXe the desired 

otj1ic"wlf^te a performance test 



182 ChcckHjii and Scales 


m a set of “V"’’ " f"""" 

n.us,besptc,r,cd Constdcr.hTLmpVTbdm™' 

unknovtn™t,ph' ^ I’alaiiii- for ucifliiiiK oltjccn of 

known weight *’(3’)"rsc*'of\'"'^'”' c'lnitlncal prism of on 
' W'lligram weights' nini lOOm'lir i" f""""' 
weights and thrLspram inph, ' ''""'" 

anco,nnd“rset""w‘^","u^^^[l‘’' 

"cjphi of ilic untnoun „ ^ li'^lancc to determine the 

hate file minutes to do this '"'"'Pram You will 

g°™™«aroekdispla5 

•he 'nalcr'ar>oinited"(i'ne^‘'V’"'^''’ "" ‘”■"'1' "" 

"sd) toclass Thcm.„,muL“‘'"’^^'>'>‘' -'“u l.asc path 

'hat are nteded arc includ 'he Isptsof mU 

'nstructions Make a rock d.vnl" '^i'^ "‘'"'"ee iiiforntation 
'ng and that contains at least sev^' "'^'i'' •"'ercst- 

so^'i’i learned about The " 'ajpks with some of each 
'hat the display can be lill ^ he moiintcd 

eoom You Will get un to in"*^ "I’ '"hie in the class 
-p lay up to ,l loZ,[V°T“ Z 

• InsrM^°'f“'''^'''''enes5 of no™’’ "P '<> 1“ 

Install an electneal outlet m no 
G'lcns access to a stamt i '’I’"" ""'I 

or oral Th bc\*»rip i 

I™ althougnfue'’d““'‘‘ h'aunimenrrorsu"'"''r"“^ are written 

•h= studenf may be 1"”' ^P«'heal rs^nm^f r' ^'"PPln 

““Id he adequate hTl of the 

ne IS interested in 



Conslnicllng a Performance Test 


183 



II he really W** 


he would" 1 xieke the chalk screech I ke that 


r ir f the student A test that is inadequate in its 
measuring the skills o measure a student s skills with any 

®ahdT.y Vrop^'trcons.ruction includes suitable preparation of 

and instructions 

tnr Procedure) and Product Criteria 

(3) Specifying access to the product and poten 

Performance tests give p„„dure undertaken to attain that 

tial access to the pr „ m the process when it occurs 

product but °^Xtthe procedure ts complete *0 observer no 

very temporary Once tn P videotaped or unless 

" obiecttve of subVttyb-l' 

test has a ce students mind the teacher will be 


Checklists and Scales 


atl ‘he 

To answer this ouestmti ti correctly^ 

the procedure The occur^***^ she must list the necessary steps m 
criterion for evaluating n J* these steps then constitutes a 
tola for demonstrating thfusTof “ 

d-play and installing an ekctricfl onto"'' “totructing a rock 

unknown tveij^r ^ balance for weighing objects of 
b Place ohjKts“lf“knm™" "“f' balance 

overbalance occurs ™ights when underbalance or 

10 achieve balan™ wnhl n f“ P°'"h™a‘'°ns 

always bringing the scale .“‘Editions and subtractions 
combining weights on a hit (rather than 

/ Add the weiehis th!. basis) 

. "'='§htofthe unknown'’™‘*““ ‘° determine the 

Constiuct a rock display 

^ the 'gneous'’class‘''tW°sed ‘b‘'“ 

morph, c (These proportion^ “d one meta 

"ymlabiliiyof theseXTwnVt 'b= B'l'mral 


; ‘£™*S:s 
„ i S§ii?==s^s3, 



Constructing a Performance Test 


185 


Recall that an objective includes a statement of cntena for 
evaluating either the process or the product or both Xn essence 
this third step is simply an elaboration of the cntena that will be 
used to evaluate the performance These cntena should be spelled 
out in sufficient detail so that the teacher can adequately evaluate 
the performance in as systematic a way as possible Thej should 
also be presented to the students either as part of the test or dur 
ing the instruction related to the task Detailed criteria reduce the 
subjectivity of the judgment 


(4) Preparing the Performance Checklist The performance 
test IS made up of a set of instructions and givens that are pre 
sented to the student and a performance checklist that is used b> 
the teacher or judge The performance checklist is just a format 
for listing the criteria that were developed in the preceding section 
If a critenon on the list is met it is checked if it is not met, it is 
not checked Consider the examples below 

• Performance checklist Bisecting an angle 
.a compass IS used 

b point placed on vertex, arc is made between sides 

c point placed on each intersection between arc in 

(b) and side, equal arcs are made 
. d line IS drawn from vertex to intersection between 
arcs in (c) 

e two resulting angles are equal when checked 

with protractor 

overall quality of performance on a 0-5 scale 

• Performance checklist Drawing a realistic nature scene 
a a sketch (or outline or la>out or some sort of 

plan) is made first 

b materials are used properl} 

c draw ing is completed neatl} 

^ elements of nature can be seen m the drawing 

e colors are realistic 

.o\eraIl quality of drawing on a 0-5 scale 


The performance checklist reflects both the steps undertaken 
m arriving at the product and the quality or acceptabilit} of the 
product Itself It is a set of instructions and procedures by the 
teacher to himself or herself in terms of what to look for in eval 
uatws a performance It also provides a basis for giving students 
useful performance feedback and attempts to make the evaluation 
of performance as objective and quantitative as possible 



186 Checklists and Scales 


treatae'm“and hcn« alfnXr""" •» ihi^ 

detailed. The checkiiit U checklists arc not equally 

terialhatareinhis orhar "'“'''"•T "*• 

them by writinc them externalize and sy.stcmati/c 

considered the scorinc Pc’rformancc checklist can be 

listed as a toe form the criteria 

torrnaace shown at ihc bottom ohhc^Wkllr''''’’'''’" 


SCORING THE PERFORhMNCE TEST 


™rioga‘;erfo™a™"Jj;,"“ ^ ™los for teachers to use in 
(that IS, the requirements as set f If requirements 
jve importance of the different erit ^ the tela- 

he “lent to which thej can ^0'^ 

«ehers can build thefrown apnroTHv Performance, 
e'er some gnidelines may be helnfSr=id''‘’""l P--'«edures. How- 

tr .0, 

yoursel/n” ^ reasonable ) Exnmi ^ eighty percent is 
as: Are "n ask 

eeceptable ‘mportant? Are somJ 'u' ™"'P°"ents in the 
‘ery? You/'"^”™"’"™’ What ferec nf “^^“'“Wly essential for 
"hither von “-“e q'.es™n/!“u"f ^ constitutes mas- 

"hat arbilrar^"' fo us^ ^ weighting system' r^ecide 

rule. Or you m, Percent sucmss-.„°'^ *° ^“hu'c 'he some- 

performance bLsL*"”’’'^ '« assign Proficiency 

a' an indication o‘f?II‘^"'''^''hst entries anr’’" '° ’’’= 
“f performance Cot, m students degree of this rating 

• Perfo™^”''=^'’“»='^ 

~ -a. e'udmu'oVs handstand. 

h. student is perfeM ®"' 

^ f: ;Su^'’‘ ™ fee?' «f'=cn seconds 

0-5 scale) Performance on a O -15 scale (or a 



Constructing a Performance Test 187 


Box 7.2 


A DIVING CHECKLIST 

Have you ever attended a diving, gymnastics, or figure skating com 
petition or seen one on television'? They constitute performance tests 
Have you noticed how, after a performance, judges hold up cards with 
numbers on them (7, 7^8, 8 and so forth) to indicate their evaluation of 
the performance*? We can presume that the judges are all using a similar 
checklist, based on their expertise and judgment although the checklist 
IS often unwritten Here is an example of what a checklist for the triple 
sommersault dive might look like 

Performance checklist Completing a triple sommersault dive from a 
three meter board (tuck position) 

e shows balance in approaching jump 

b Jumps evenly, erectly, and high 

c goes smoothly into tuck position (le, initial entry into 

tuck position describes an arc) 

d completes three complete sommersaults 

e doesn t roll while sommersaulting 

/ comes out of tuck position in three motions 

g IS out of tuck position before entering water 

h body IS straight in entering water 

! entering splash IS small and quiet 

j entire motion from start to finish is smooth 

Overall performance on a 0-10 scale 

Note that there are ten criteria If each criterion that was met with 
minimal acceptance were scored Vz point, each criterion met ‘ perfectly ’ 
scored t. with a 0 for each failed, the addition of points would represent 
a final judgmenf 


Suppose that you felt that each cntenon was equally impor 
taut but that you wanted to reflect the degree of performance You 
miEht give a studenl one check for a category if the perforniance 
was minimally acceptable (A student who, for example, got up on 
the second tiy might get one check on criterion a ) Clearly accept- 
able performance (such as getting up on the first try— wavering— 
but staying up) might earn two checks while outstanding perform 
ance (eg getting right up the first tune) might earn three checks 



188 Checklists and Scales 


The student S score for the total performance of doing a handstand 
then w™ld be the number of checks he or she earned ranging from 
zero to hfteen Teachers desiring a simpler scoring procedure 

five T f f out of 

™uMhe to f performance A third possibility 

Z u be m ^ and to require 

remaining four needed in addition as a basis for passing 


measuring behavior 


X'r ^rk'tdlmtr 'he exhibit a ten 

achievement of a commrvL i others toward the 

characterized as nonconsinZ be or she be more likely 

tions’ Is he or she the kind of s'n.d" 'I** interpersonal rela 

goals or one who can barelv keen f"* beyond required 

on time or usually late’ Is assieZ^ or she 

It consistently latez Does the^ work completed on time or is 

work well with others in a hi»h'« ”‘ "“"nue to perform and 

seem to lose composure’ These arr*' situation or does he or she 
crs look at and consider imoortanf°"’' 'hings that teach 

personal attrihmes of studentr ““o^sments of the 


Why Measure We have »„ii, j , 

-bavlorl tbinkTo^l-tranTr^Z^ bow they 

lasrh'’r““boals.nerZo„ o" of these 

olass behavior (which m tuZr ^ v'" •=‘'bing about their 

persona attributes) Ar'daZ:;"? "^ging their 

teachers^n relevant and important^ personal attri 

>esPartnf obouMhem’ Should 

imolves devpinni “"'■‘•ered desirable • p™ '*? ,‘*“'‘''0 out* °tl 


•molves devpl ^ desirablp « p drive and othe 

Needless tosav a u Present and the future 


Needless to say a tea >. ^ Present and the futur 



Measuring Behavior 189 


Do teachers currently evaluate student behavior and the per 
sonal attributes that this behavior reflects’ Again, the answer is 
yes In some form or another teachers obsem, record, and often 
report student behavior, and judge personal attributes In doing 
so^^they are at least implicitly acknowledging performance objec 
lives such as 

• The student will demonstrate self control by attending to 

, :“.rrr 

matches or other forms of acting out 

.Cl, ire to be developed, then there is value in 

If j to what degree they are being formed 

determining whether a j enhancing or changing the 

“ 

makes sense to measure them 

ni we test behavior’ is the word "test 
The key to „ observe behavior’ you would quickly 

If you were asked y should you test hehav 

answer yes 'Wha" ,o answer Clearly, you would 
tor’ you may not be s , the sense in which the word 

not be inclined ^You would not be inclined to constrect a 

"test" IS commonly U students would be placed and 

speciflc test situation situation, 

their behavior obser likely to perform at their maximum, 

moreover, students “““'trSwor, because of their hjgh moti 
Sat IS, manifest their ^stJ'S tonsure "typical" behavior 
vational state If y°“ * obvious test situation that might 

yon would not want to create^“ ,he 

iter the "‘yP'“'“Ssl to see d students would control themselves, 
classroom on purpose 

•Our.n.ere|.-”,reT/i»^^^^ 

l,e ,t and m mea 
underlie them 


Do We Test 
Behavior? 



190 Checklists and Scales 


or give an independent study assignment to see if students could 
generate self initiative For the most part though behavior is an 
ongoing stream without the clear starting and ending points that 
characterize a test 


There IS an important sense however in which we do want 
to test behavior and that is the sense in which a test is system 
We need a vehicle for describing 
>“t vehicle that measures 
Sidv itsim ■' upon It 

mstm™ “ "'^'hodology and more particularly 

will be specificallv^ though not necessarily all of them We 
■or that S be Isr"* ^ d^^cnbing behav 

tivity For this pumose''wew?i"(f*?'? '^ogree of objec 

tneasurement procedures it. i,*"'* v scales or checklists (basic 

contextsrare S smubU ' 


£xamples^ 

follow TbeJe^xaiSs are behavior some examples 

before (but are not for sale in ^bat have been used 

They are presented here to serve published tests) 

structing their own instruments ^ teachers in con 

tlents (and students personal ^oasure the behavior of stu 
collectively ^'tributes) both individually and 

called figure 7 6 It is 

atudent behavior on sm “ "'‘™P' ^ttantify 

^“Jance on the meaning of are given tohS 

nroc!? 'Vhde this ex^n^ assigned 

can fi the kind nF if i? * a system wide 

a^sr" - No"tfc:^tS: rs^rnr*-" 

measure summarizes 

brhai or'^an^ 'tbe making between il. 

trnrlieij niue^nuo^^SC “Onbules ihi' ,^15”'^'™'"* of students 

‘I based on the 

^ smg here M^y “l'<tescnp,„„j It is S ib 

able and w,|| tor students self r^„ *= former we are dis- 

mscmsed .„ Chapter 14 taPorting are currently avail 



191 


Sample Maturity Index 


Figure 7 6 


Trenton, N.J., Public Schools 

MAmiry index 


Name 

Address 


Last 


First Initial 

Date of Birth 


H,R, Adv.Gr. 

Date 


Teacher *s 
Signature 

Relia- 

bility 

Work 

Habits 

Self 

Control 

Initia- 

tive 

i Sensi- 
' tivlty 1 

Home Room 

Teacher* 

Pupil's Estiraate 







Punct 

uali.1. 


THE RATING GUIDE APPEiUlS BELOW 


Indicate the degree to which this student oeasures up to the goals 
described below, 

4 • Exceptional 
3 • Above average 
2 • Average 
1 - Below Average 


Reliability 


Initiative 


Work Habits 
Self Control 

Sensitivity 


Punctuality 


> Does he work willingly and capably with others toward 
the achievement of a consion goal? Is he truthful, 
honest, dependable, conscientious? Does he aasune 
responsibility for his actions? 

- Does he work beyond required goals? Does student 
assume leadership? Is he creative, original, ex- 
ploring? 

- Does Che student complete assigned work? 

> Does student remain emotionally poised and physically 
restrained under stress? 

> Is student sensitive enough to be thoughtful about his 
attitudes and responses? Is he consldecace of the 
rights of others? 

. Is student on time? 


All results are suanarited. A cluster of good ratings will offset 
one poor racing. 



192 Checklists and Scales 



It haant helped their reading 
with computer hardware 


but they've 


become very proficient 


The second e:^ample aooearc t 

adapted from Pj.gma/.o>i m (fce c/assrooniTJI 

son ( 1968) Unfortunately, some oF tn i ^^°senthal and Jacob 
student "behat tor" in this measure f p ^ ^ describe 

needing approtal’ ) sound more hke i5a '"'^'^sting, appealing, 
tudes toward a student than the wav i of teacher atti 

In order to use the instrument as a behave 

'nstructions would need to be given to , "^'’’'al measure, specific 
mg of the terms clear They fhould 'hn mean 

obsen able indications of behavior (and a i"'"'' mspond to 
bulcs) rather than manifestations of ftm, a" '''"® n"" 

student (see Dot 7J) For example " Particular 

described m the mstruclions as iri! ' “"“ns behavior miotu 1,. 

asking frequent questions loofciZ™^ '°f •’’■ngs work 

:^;:o'riorprer"‘ - -.set,?rr."he1e""® 

“c learning and 



Measuring Behavior 193 


A Classroom Behavior Rating Scale to Measure the Student Behav- Hgure 7.7 
tors Studied by Rosenthal and Jacobson, 1968. 


Name of Child _ 


Name of Teacher , 


(1, TO wha. extent can the child’s behav, or be descnbed^as 

not AT 1 2 3 4 D CURIOUS 

ALL 
CURIOUS 

t oan the Child's behavior be described as INTERESTING"^ 

(2) To what extent can the chi d =“" ,5789 EXTREMELY 

NOT AT 1 2 3 4 5 interesting 

ALL 

interesting 

.. behavior be expected to lead to FUTURE SUCCESS? 

(S\ To what extent can the child s be . « ^ « cvtbcuc 

' ’ Kin 


'^”h,-— r 8 8 7 3 9 EXTREME 

NO ' EXPECTATION OF 

expectations of future success 


future success 


d's behavior be described as ADJUSTED? 


(4) To what extent can the child's oena.iv. ^ ^ ^ EXTREMELY 

not AT 1 2 3 4 s adjusted 

ALL 


adjusted 

, ,ne Child's behavior be described as f PILING? 


, 5 , To what extent can the cm os 5 g 7 8 9 EXTREMELY 

' ' NOT AT 1 2 APPEALING 

ALL 
appealing 

child's behavior be described as HAPPY? 

( 6 ) To what extent can the cniio 5 8 7 8 9 EXTREMELY 

' ' NOT AT 1 2 

ALL 

happy 

.hp child's behavior be described as AFFECTIONATE? 

,71 To what extent can tne c 3456789 EXTREMELY 

' ’ not at 1 AFFECTIONATE 

ALL 

affectionate 

the child's behavior be described as HOSTILE? 

( 8 ) To what extent can in ^ 2 3 4 5 6 7 8 9 EXTREMELY 

HOSTILE 

HOSTILE (contimiedonnsd' 



194 Checklists and Scales 


(9) To What extent does the child behave in such a 
APPROVAL? 

NO INDICATION 1 2 3 4 5 6 

OF 

NEEDING APPROVAL 


v/ay to indicate that he (she) NEEDS 

7 8 9 EXTREME 

INDICATION OF 
NEEDING APPROVAL 


Box 7.3 


P/JalTm 'ihe «P=™=nt .n their book 

they had scored h.nh «« academically, presumably because 

appeared on the list had h* students whose names 

What happened ,1 ho " TH body 

researchers founi that ? °n the list^ The 

increased over the course bloomers” bloomed their IQ scores 

- -fe '0 =«re:vLr:: “r .rr' *“ = 

at >'’e'en/oi'Ihe%choo?v!l'°,"’°'‘^ behavioral ratings of the students 
4hown in Figure 7 7 Whai^ha* 'nstrument similar to the one 

"=>« expect biooiTs ls beno"'i^’ 'Ob 

P'«' end in l,ne lor greater 5000.5^11,°'*^'°?' 'n'bfbsbng. bap- 
APPeremiy the classroom behavior „t !i h '’b'’®'”® "°t on the list 
«Peclalions ol them perhaps also the e'lected by teachers' 

by their own expectations lor shiHel^.! luPgments were affected 

f^ompletlng rating scales to rearfT •‘^Portant for teachers, when 
no( What they expect to see ° record whaf they actually see. 

Although CCrlsin a 


^iidm, Sei/^jc'S'sco/e’^nV' I‘ >p Palled 

^•■e'lPr of indiudual students'’ “"^d for descnbing 

m r?' o? “tu™" 'i;?' “ blass of students if. 

eucial "™ .'“Ip 'Vthout teacW^^L’" ‘=‘=1' area (e 
'be fore'" ® ■ ‘I'Pl not treat others ^nd in 

‘«'>ond I's fi?'"® 1° Perfdm V ’* ’^P'''="Pn'e b. 

E en requirements) and thocp ^ ^ 1« 
i and those restraining him or ] 



Measuring Behavior 195 


or self-controlled actiutv 


Student Sell discipline Scale ‘ 


Student 


Figure 7 8 


Degree of Occurrence 


BEHAVIOR 

absent 


BEHAVIOR 

PRESENT 


Moved to new task as required without 

teacher Intervention 

worked on a task without the teacher 

engaged in task behavior «"hout the 
teacher s prompt e^eirements 

°Mrrr::eru=^- 

completeness ol f f ^3 3 3 <,e,ce of 
used or assisted = 3 ” Jg his/her 

information about do ngo 

^d interest centerasinteoralpartotwork 

Camzed his/her work schedule^such that 

as resource lor on going amivity 

Did not treat others ^ ® ^ another s 

Did not attempt to interlere with 

DTnot press tor the teachersattention and 

afleclion 

rha:ed'"dul. Cher than the 

Te^sked by peers contributed el, ort or 

material to their activity 


12 3 4 

12 3 4 

12 3 4 

12 3 4 

12 3 4 

12 3 4 

12 3 4 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

12 3 4 


5 6/89 

5 6/89 

5 6/89 

5 6/89 

5 6/89 

5 6/89 

5 6/89 

6/89 

6/89 

6/89 

6/89 

6/89 
6/89 

6/89 

6/89 


Developed by Tuclman 



196 Checklists and Scales 


COS'STRUCTING A BEHAVIOR RATING SCALE 


The construction of a behavior rating scale," that is, a measure of 
the quality or style of ongoing behavior, involves three steps (1) 
sp^itymg the behaviors to be e\aluated. (2) describing these 
behaviors, and (3) designing the specific scale or yardstick 

Be Evaluated. What behav- 
ioudel '“7^'“=“=’ This will depend on those that 

lying nersonal relevant pnmanly in terms of the under- 

whefhervon h *rymg to enhance In essence, 

gardins stadent S'™ objectives re 

invohe such thinvs persona] attributes These probably 

thehke You mav^ai e ’ ‘'"'oction. adjustment, and 

'vhere you are consciouslv a ^ specific behavior goals m areas 
dent behavior The first sfen^^ sys'ematically trying to affect stu 
-kebabs. of scale ts to 

follous '"“S'rahve purposes that your list appears as 

• The student will 
(1) “mpletchisorherwork 
3 "Tr" 

M exhibu‘‘'™°"'°“‘^ ‘'■“ssmates 

(6) be neat andTi^amSd”^ classroom and school 

J) pan, cipatefmelym the classroom process 

l^eacher has his w^erTwn s^cifi teacher Each 

f “'room behavior althou.h^ °'>J"™'<=s when it comes to 

rlrh'’"'' ‘'-hers can dmw teTf'Lr^’''^ " P°°' 

tern the sanous sources y™^m= ''''a and P-eces 

icrm “scale" I 

Ef yard 

- -■sr'TaSrfcvrfJ--- 

a nrst been established 



Constructing a Behavior Rating Scale 


(2) Describing the Behaviors The behaviors to be evaluated 
must be more than named They must also be described as opera 
tionahzed so that they can be judged with reasonable objectivity 

rehabihty. -d co— ^^^reLr lo 

tc"nle?srairrnLtbede.^^^^^^^^^ 

-hto’m and attempt to 

describe each 

gun are finished ^jked (and often when not 

asZTtmaintams proper classrooindecoruni c helps class 

mates d '^‘^^^“hlbns knowledge not acquired in 

<-rr 

Lm a^^ut riy and how of things ra.her Ihan jus. accep. 

mg them , „ . specific resources when instructed 

(4) Besonrce resources on own initiative c uses 

to do so b uses sp outside of the classroom d uses 

resources (eg m library consults card 

resources to oesi 

catalogue) „..„t,nn a keeps desk and work area 

(5) Neatness and j use (stores things neatly) b 

organized and neaMvhenn^ciassnioni neat d organises work 

aS materials in a syst^atm^^^^ ^ butes 

(6) parltctpation a for activities d contributes 

"“.“enals Td mformat.on to the class and classroom 
materials .nvolve to some extent a 

Although the above nntena^ ^ t 

teachers subjective neac p„ur points of reforen": 

the teacher to use m u judgment than a single one 

nrovide greater help m behaviors is not the same weigh 



198 Checklists and Scales 


a weight of ‘A All other behaviors would have a weight of 1 Of 
course if you had no interest in the total score but simply in 
recording each individual behavior, weighting would be unneces 
sary since weighting is used to reflect relative importance 


(3) Designing the Scale As we have seen before, a scale is a 
numbered continuum where each number represents the degree of 
a particular quality such as acceptance or rejection, presence or 
im T? or os many as 

'’’roo five , seven , and 
while fpi scale points introduce greater variations 

Imber of T*' ''™c of response The odd 

pZt 0 °f providing a middle 

point to tenect an undecided or down the middle judgment 

file point r^Ungsra'ler'^^^ '’'='’oviors into 

(I) STUDENT COMPLETES WORK 

1 3 4 5 

rarely occastonally frequently always 

cussmIte^ teacher and 

^ 3 4 S 


1 exhibits an interest in learning 

' A 


always 


tat „ occastonally 

<; student UTILIZES EDUCATIONAL RESOURCES 


always 


I 

ne\cr 


2 

rarely 


(51 CTimv occa;;^y 

1 '^T >S NEAT AND ORGANIZED 

^ 11 


always 


(61 STUDENT PARTI frequently alwa 

*TES in CUSSROOM ACTIVITIES 

;; — 4 5 

castonally frequently alwa 



Using a Behavior Rating Scale 199 


cfFectneness 


USING A BEHAVIOR RATING SCALE 


1.0 Ihc use of a behavior rating scale 
Following are examples of the use oi a 

Indhmv the^perform « 

as part of our evaluation of them we m 

(2) Diagnosing ,^°“"“ee”n re'fSred°to ^ behavioral 

Some students present special services and paren 

problems Such students ° helps m the diagnosis 

tal consultation Behavioral meas must be carefal 

and documentation °f ,hat will mnuence the expecta 

not to create a futu tea"Urs and hence their treatment 

tions of that students future 
of him or her 

(3) Altering ■■nP™vlng and ^'Tirhibum'g oAers 

teacher must keep 

tain specific way effect evidence abo j„_*c but on the 

producing the desired effect ^ outcol of the 

rv:?:i«ro"th=cia^^ 

department chairman or p 



200 Checklists and Scales 


Rgure 7 9 Sample Classroom Observation Scale To Measure Implementation 
of Learning Unit Plan and General Teaching Effectiveness 


Classroom Climate 
Sluderils 


Degree of Occurrence 
ABSENT PRESENT 


(1) Maintain a conversational noise level 

|2| Move Ireely and purposefully about the classroom 

in creatiue activities to capitalize on 
their talent and/or interest 

(4) Interact with one another in meaningful ways 

(5) Seek help trom teacher when In need of assistance 

Motjvatlon 

Students 

15) Sell assess ap7"ecTd' ther" "™ intervals 

oblective ^"0 PteOress per 

teachrimmention"^ minnnum 

'’’leraclion 

Teacher 

Par5cipa!e^!n '^ad students to 

— rts studeots teellp,. 

— -vn™teo, positive 

-om student 

'“"d'mms-oupto group 


1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

Subtotal 


1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 6 

1 2 3 4 5 

1 2 3 4 5 

Subtotal 


1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

Subtotal 



Using a Behavior Rating Scale 201 


Specific Teaching Variables 

“rnmes m oontmuou. evaluat.on of sfudenfs' progress 

objectives of the und of instrucfional media and 
(6) Makes ^ Lividoalized learning 

P, P.o":r inJ::u':m:a, seccences fo fd differen, learning 


Degree of Occurrence 
ABSENT PRESENT 


1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

Subtotal 


then the scale is 

■hen the class is m sessio j -^jiy engaged in the instructional 
:acher when he or she ^ n critical incidents as they 

rocess, he or she of the entire process, the teacher 

ccur To increase the o 3 ^ ^j-yed behaviors 

bould try to focus on judgments, particularly when they 

To increase the rel.ab.liy of ju g should be made 

re made of ‘"‘^‘''‘^““'.ndently Makmg two judgments 

mce each time f his or her own internal judgment 

rris,r— 

unued m Chapter 10 ) required for rating the 

The most .n'P°''“"'qSm vvork m the development of the 

<= ^‘‘/^.lence in practicing wrth the 


ople is °'’l''‘'’‘,''_rstandmg of the meaning of scale terms 
adequate understand E (trying it out, lor 

-d ^Tb"?n”hrSS« develop techn.ques for rating be 
pie) will help 

,r objectively 


Some 

Suggestions 



202 Checklists and Scales 


Additional Information Sources 

Amidon, E J & Hough, J B (Eds ) Interaction analysts Theory, re 
search, and application Reading Mass* Addison Wesley, 1967 

Boyd R D & DeVauIt M V The observation and recording of behav 
lor Revieiv o/ Educational Research, 1966, 36. 529-551 

Flanders, N A Analyzing classroom interaction Reading Mass 
Addison Wesley, 1969 

Lien A J Measurement and evaluation of learning Dubuque, Iowa 
Wm C Brown 1967, Chapter 6 

Simon A & Boyer E G (Eds ) Mirrors for behavior An anthology of 
classroom observation instruments Philadelphia. Pa Research 
for Better Schools 1967-70 

Weick K E Systematic observational methods In G Lindzey & E 
m ). Handboofc of social psychology, vol 2 (2nd ed ) 
Reading Mass Addison Wesley, 1968 357-451 



Self test of Proficiency 


SelMesl of Proficiency 

(1) state three criteria teachers should examine and consider before 
engaging in performance testing 

-rr.=r;s-s-r;=r:r= 

these cases 

. . s #.nH out whether students know how 
J’rccnslIucTarequilstorunangle Specify the lest siluafion and 

desired pertormence oufcome 

^ used to evaluate student perform- 

L. a imit on soace relations and want 

(5) YOU have i''®' ''"f "^.,Xtrcan visualise e spatial arrangement 

to find cut and desired perfcrmarrce 

such as sllcor plan Specify a rest srrua 

ohecX/is. that can be used to evaluate student perform 

(6) Prepare a checX/ist inai 
ance In drawing a floor p 

(7) YOU are interested in measu'ins ,o be measured 

in Class IS ihP""'® ? “ observable criteria for its detection 
and describe It V ' ,3 ,he 

(8) Prepare a fs motivated to learn 

extent to which ^ ,3 p^^p^ped 

(9) YOU want to , ha behavior to be measured and describe 

to teach a ^able criteria 

It by listing (bbf ° hP used to measure the 

(10) Prepare a ^prepared to teach a lesson 

extent to which construction of 

(11) You are ^ '3,“ and"ob)eclives Cite one argument you might use 

rrrr--"-'-:";” - — — 

that they are facing 3 program? 



part three/Evaluating a Test 



chapter eight / Test 
Appropriateness 


onJCCTIVK l Define appropriateness a criterion of achie\e 
ment tests as the consistency between test items 
and objectives 

2 Describe and illustrate the use of a content map 
for building appropriateness into an achievement 
test 

3 List and apply checklist procedures for evaluat 
mg the appropriateness of a test by determining 
Us a o\craU Jfij lo Db,iErJjv£5 h r-njxe;iT>Dndence 
to intended behavior c correspondence to con 
duions and d correspondence to criteria 

4 Define domain referencing and its use as a means 
of increasing test appropnaleness 

5 Identify and list factors based on instructional 
and student characteristics (rather than objec 
tucs) that can make test items inappropriate 
namtlj testing what was ncier taught response 
sets and cultural bias 



WHY EVALUATE A TEST’ 


Spring 1973 


A test ts a dev.ce for samplmg behavtor 

the sktlls, competences ^^^tan ce°LmpL the 

ms?makL‘ to be ^sfas we e°valuate 

rap;C“of .n-— jrbls:a" 

An Example of an Inappropriate Test 

Introduction to Psychology 

830 121 

'"’a '^h'eCl.c'rtvirblilrodta'ues ol a celture a.laot the behav.o, 
] r:" , percepbon and though. ,a d.Ca.ed by the a, rue 
defmed by the role the word plays .n a sen 
tence rather than by about the tneaning ol words is 

d the agreement betwee 

b'PP , .npcilies a superior retention ol incompleted 

,,,ybaZe,garn.oK~^^^^^ 

over completed tasK c 2to1 

a 3 tol d 1 5to1 ^ 

b 2 5 1° 1 a, barception is most clearly associated with the 

(3) Atransaclionalviewolperoep 

work of c E Gibson 

a Ames d R 99® . j ... 

b J J S";“",bermal sensitivity of the skin is usually relerred 

(4) Nales model ol therm 

as the ,b,orv Iccal generator theory 

a concentration theory ^ vascular theory 

b gradient theory is most clearly illustrated 

(5) The Freudian concept 

,P the case of c Dora 

a The Wolf d Leonardo da Vinci 

d Little Hans 


Figure 8 1 


207 



208 Test Appropnateness 


(6) The earliest piece of psychological laboratory research on memory 
was conducted by 

a Ebbinghaus c William James 

b Hermg vKimdl 

(7) In establishing lest validity, the standard error ot estimate Is found by 
which of the following formulas? 


h 2 (t-v.i'+v.i) 

(8) The comparison levet as a concept Influencing choice of social behav- 
.cr has been defined as a standard for evaluafing the rewards and 
0 ^^, based on the model value of all fhe out- 

comes known lo the person This definition was stated by 
a Thib^^anPKellay ^ and deOhlrms 

onhe uLal Ten« r established In a dog by means 

o=cas,onalft!ldSfr"HT 'b'"'Ofcament A new stimulus Is now 
may be at Interval ’ ^ ’"benever the combination Is applied, which 

never acoJSedrr” a days. It i. 

the combination is ^ stimulus In this manner 

ditioned stimulus when'^2 ''^effective, so that the con- 

stimulus loses Its >n combination with the additional 

wth constant reinforcement ’"ften applied singly and 

this phenomenon Plains its full powers Pavlov named 

a external inhibition 

b conditioned inhibition h e’'‘inction 

(10) Piaget has labeled the nrnr« if 'Reconditioning 

interesting spectacles last as ^ '''hich the young infant makes 

a reproductive ass, milaf, on „ „„ , 

h 'auagmloty assimilation I ^'’"^ral'zing assimilation 
d accommodalion 


bon" h" ’’*^‘'’'“^S'''an?your',nV 

Ind “ '’■Ebre 8 1 TOen you the cxamin 

?or.v n of a Kvch f ™ to the facul, 

'-.ypeople) ‘hafoIlowrngiaStere^^fb^^^^^^^^ (a total 



Applying the Criteria of a Good Test 209 


Frequency Frequency 

No Correct (No of People) No Correct (No of People) 
n 5 6 0 


0 

1 

2 

3 

4 

5 


5 

9 

12 

7 

4 

3 


6 

7 

8 
9 
9 

10 


Most of the faculty 

two of the ten items rig . j,, (,^31 ,5, 6 5 right out of 

correct had been set as ^ j student group who took the 
10), the entire facu hy ^"d g ad^Je stu^^ 

test would have for measunng a student's compe 

tise' Is this, then, a , fjo. why not’ Because it is too 

tence in nf one could answer more than half 

hard’ Is It too the reason for it being too hard, this 

of the questions’ ^hjs reason for it being too 

IS the evidence for job, ect matter specialists did not have 

hard IS that people who are questions If subject 

the information ^itar with this information it is un 

matter specialists psychology course would cover it In 

likely that an Xef,Le a test and we need 

other words, we need ea ,est evaluation in terms 

evaluating the “1“^'“^ °^y,ld,ng or selecting tests that have noth 

of criterm^we might be building are likely 

lobrtot’h^rdortooeasy 

applying the criteria of / 

„ ,, u„oi, were devoted to test construction 
The first two parts of own measuring instrument, you 

®;:ia"-J-«y^:^riteria can be applied, two practical 
Before the tesxui&^ 

assumptions roust be me ^ 3 , crested m and willing to take the 

<” Ldu"P™'‘=>‘’"'''“" 

time to cvo 


GOOD TEST 



210 Test Appropriateness 


(2) Once you have given a test you are willing to consider the 
results not only as a way of evaluating your students but also 
as a way of evaluating your test 


This and the succeeding three chapters will deal with the five 
test cntena mentioned above and illustrate how each can and 
should be applied to the process of evaluating a test At the end 
o apter 11 (the last in this sequence on evaluating a test), the 
criteria covered in all four chapters will be summarized in the 
orm o a checklist The five cnieria will be covered both for tests 
nat are designed for predicting future outcomes (such as intelli 
and for tests designed for measuring current gams 
distinction between the two 

types of tests is described m Box 8 1 


Box 81 


...w winicitaiuNi) OF TESTS 

tests '"™!"sonhlSnTwoMraLt 

’’"'''mdTvXaldillorenws'Jr^'^ ^ 

aillerences tor example a general aptitude ability or 

'‘’“T»°dr grawil »’'= 

knowledge sum or aohle,eme„r(p"5™f "’^-nrement of 

tt an aptitude test m ittatlieinalics11*''f^ 7 ^ depending on the purpose 
variety of mathematical tasks thon Predict achievement in a 

nctast I, however the same “"^'dered a psychomet 

3 s udent had learned as the resuii determine how much math 

asanedumetnetest pf Instruction It would be classified 

'=^^dhome,rlc way intelll 

and ev I P ° Pt me example (that is t dumetric way achievement 
metrrc ^ “’!,°'’ done in a wav th development 

'adumetric purpose of the test ^ ^ d°nsistent with the psycho 



A Test Should Be Appropriate 211 


A TEST SHOULD BE APPROPRIATE 


A test IS of no use if it does not measure what we want it to meas 
ure If we call what a test is supposed to measure its objecuves 
and what it does measure its outcomes then one cnterion of a 
good test is that tt measures outcomes that are conststent "’"'"'J 
Ibtectives A test that measures what we intend it to measure has 
fipemafkind of suitability that we will refer to as approprtate 

"""if a test lacks a purpose or objective it will be ^possible to 
It a test 1^ P P evaluate it on this particular 

assess its app P c _ (gst is to measure some charactenstic 

criterion The purpose of 

t eopj^i what they have ^ ^ 

like and so on The hrst t g ^ 

we are considenng Us objectives are that is 

we are about to 0 „eo have decided what a test 

what we intend .no^mea^^^ determine whether it measures 
IS intended to whether it is appropriate 

what we want it to that is noj-tant things about the concept 

we should say ‘7,n;"p;.;S pfimarily to the evalu 

of appropriateness First 0^^^ resulting from instruction It repre 

ation of wav to judge achievement tests but it is not as 

sents an dictw or psychological tests such as intelli 

helpful in ■l“‘*8‘7onality tests Second appropriateness (i 
or personaiiiy _.._u hprause 


intelh 

1 * f^cfc <;pcond appropriateness (or con 
gence tests or ven much coverage because of its 

tent validity) IS ° „,vcholoRical tests However because 


tent validity) is °° pfychological tests However because 

limited value for tte kinds of tests most typically used by 

of Its value for eva ua ng 'he^ ^ 

•“the" vanous queslmns for applying the appropriateness cri 

tenon a"re listed below 

Mv Test appropriate^ 

( 1 ) nems for each and every objective and 0 

Items ®7r°ottSr^r objective accurately reflect the 

^ Sa"po-nceofeachohjecU^ 



S5£jS:-» “srri™ IS,,, r- ~ 

type 



212 


Test Appropriateness 


The Importance 
of Objectives 


(2) Does It Reflect the Action Verbs 

a Does each item for a given objective measure the action 
called for by the verb m that objective’ 

b Have I used the item format most appropriate for each 
action’ 

(3) Does It Utilize the Conditions 

a Does each item for a given objective employ the statement 
givens or conditions set forth in that objective’ 

(4) Does It Employ the Criteria 

a Is the sconng of each item for a given objective based on 
the criteria stated in that objective’ 


obTecnvef? Pi^eding chapters dealing with 

toS ' “ “ r rr ^pe^tfication of objec 
is based orf the^xtent WwST'"''™ ® appropriateness 

must first snecifv th u measures its own objectives we 

PS objecttves then ttsvaKiUte toted P"" 

ability 'Because°this^s”''°^ ^ *° determine a child s mental 

appropnateness In els >>= hard to assess 

purpose of the test is to meae *i h® helpful to say that the 
ships numencat reasoning verbaf laasomng spatial relation 
•nte for example of the SlJoUr 

Remember that aoni-nni- ^ ° Mental Maturity) 

which the " ’■^presents the extent to 

(le objectives) that it is mil °f the content or skills 

test after two weeks of instrl/ ^mple If a teacher gives a 
•hat lest will be based on the extent *!l' “PPiuPnateness of 

nt or reflect the material that ° which the test items repre 
d®nts were expected to learli) a™" "T""* hence that stu 
eness is hke a mapping or mateh”^ those two weeks Appropri 
Itfi “‘™ded to have b^n t“® '^h''' hemg measured 

;4itt 

in terms of pre 

but r„„ 

appropriateness of a test without 



A Test Should Be Appropriate 213 


Conlenl Map lor a Urvt and Test on the Deserts of Alrica (Social Figure 8 2 
Studies, Fourth Grade) 



Objectives for Unit on Deserts 

Units of Importance* 


Given a map of North Africa 
students can 

1 

2 

3 

4 

5 

6 

(1) 

Mark in the location ol the three 
major deserts 

® 

® 





(2) 

Recall and write in the names of 
these deserts 







(3) 

Identify indigenous plant life 

® 






(4) 

Identify indigenous animal life 

® 

® 





(5) 

Describe how humans satisfy 
their basic needs there 

® 

® 

® 




(6) 

Descri^ltalitishke to g row 

UP there 

® 

® 

® 

® 



t7) 

rules ot getting along together) 

® 

® 

® 






^ there 

. Based on time spent on each 

nr DOint of credit on a lest item 

® Single test Item or point or 

f illustration, let us take a unit of instruction 
For purposes of illus ^^at was 

and prepare such a map The m ^ 
that IS, a list of each part was covered— that is hoi 

outline), and to what deg (presumably though not 

^ch time was importance) Figure 8 2 shows 

necessarily, a partial ind studies, this partic 

i m the dererti of Africa The unit contains 

iilar unit dealing with th column 

ie^n obiectives that ^ve b^^^- , 

Units of ea“h o&iiveTSve been listed across the top 

importance ot e 



214 Test Appropriateness 


The test for this unit has been constructed so that a student 
can obtain a number of points for performing each objective to the 
degree indicated by its units of importance Thus the test not only 
represents each of the objectives (no more no less') but it repre 
sents each m proportion to its importance or emphasis Impor 
tance is reflected in test points either by having more items or by 
having more complex items for which more than a point may be 
obtained * Importance thus ts represented by the number of per 
lormances required for each objective 

rnniJi'f accurately a tests content and coverage reflect its 
the ui ^ objectives and their relative importance 

ouT° map before 

L.tandl^ y™"- test and then teach your 

prmenls of "i? y™"- map the appro 

U >Tur tesfuC <5'^=" P^P^^y .terns) 

tivcs a test based reflectors of your objec 

-f appropnateness a" canlfexS ® 

as a pobbshed achievement^ constructed by someone else such 
ateness for your numocA determine its appropn 

the content outline of the tesi"* °A correspondence between 
content map for your dassTndTh'* Construe, a 

«c whether their'^conteni ““mine the test items and 

Put m a dot for each item as P°"*ant map 

the test fits jour map Exam 1 Pfgure 8 2 and see how well 
shown in Hgure 8 3 samples of a good fit and a poor fit are 

•’’e same for ™>l'omSc?reach"i’‘'"p“ of a test may not be 
0^'n content map One teachers [j determines his or her 

fna> be more closely related to instruction 

"""S :Sr "> *• 

sou hare m“'^ "-cselwo ^de L s d P'-«“m 

' 'Pcasured Perfonnance on each n£ 

’ nc, be do u. 

o>u"m' 'p.rmo™'S;.l^ °™P>n°nce"bM al?o m"”’"'' """S P" 

n>o.a,e,. cwup.ex .be ubjec.ue .’he’“J?e' Sf-S, 



A Test Should Be Appropriate 215 


A Content Map Compared for Two Tests One High and One Low in Figure 8 3 
Appropriateness 


Objectives 

1 Units of Importance | 

1 


3 

4 

5 

6 



® 






m 

m 

wm 




BHEHH 


® 

El 

■1 



HHBHH 


Si 





HHBBIH 

M 

Si 





'HBOHH 

wm 


1^1 


WBM 

X 


eii 






8 









HIGH APPROPRIATeNESS 


Objectives 

Units of Importance 

1 


3 

4 

s 

8 


® 

IkI 

MM 





X 

- 

n 




hbdm 

El 

»o.» 

mm 

n 




El 


■gM 






mm 

E 




6 

® 

X 

X 

X 

X 

X 

7 


® 





8 


® 1 . J 





LOW APPROPRfATENESS 


® Test,temlhatraaasu«sag,venob,ectlve 

® Testitamttatdaaaaotraaasurea9,venoblecti.e 

? ctven cbiaobva lor which a necessaTftas. item la miaaing 

Bold line indicates oulhria of conteninnap 





216 Test Appropriateness 


be no objective for which there are no items on the test (unless 
you are using some other measurement strategy for one or more of 
the objectives) There should be no items on the test for which 
there are no objectives If you find items on the test for which you 
have no objective then either remove those items or rewrite them 
so that they conform to an objective on your list (You may also 
consider modifying your list ) It is important to try to achieve 
consistency 


the hsror°"‘’'."“ Behavior You will notice from 

action three components — 

evaluating itions and cniena — ^have been set forth for 

obiectives Of ^ 'terns are consistent with your 

difficulty Often th verbs typically cause the greatest 

kmd^perf' •>-= will specify one 

different kind of n f coTespondmg item will measure a 

quires " he smdem'm verb re 

Item asks him or her to soinethmg and in fact the test 

identify something does not Because a student can 

esting situation appropriate for mL?"’™' “upping a test and 
°^i^chves be the elementa lU objectives Let 

Clements that are flexible enou \ items the 

mr/icr r/mii t/ie o/f,gr ivay aroimd fitted to the oh;ectives 

f “etching of objective "at'dT t""'’ The same kind 

terms of conditions and criterl ?ll demanded in 

eb ^tiieas the condilions TndLlrtT*''™'' t°rth in the 

ure be the conditions uction or performance 

urn h« objective If the ob"t‘ we ‘'’'= “=us 

Provid‘cV,„™P "'V U“ted slates ,h? “V*' 

handlpfl “ “easure that i!' " u map must be 

student s'd ' ' manner If jj,, v Criteria must be 

'east su of “PUEraphl “'if specifies that the 

should be scored ^ ^ response nr " measuring 

ehjectne °f the criterion °f “o“ and 

" ‘t was stated in the 



inws machme .a . ou. c. whack reaC.n, .s 9=.n, ,c ha 
e,eh harder than I expected 


domain REFERENCING 


rest's anpropnateness is dependent on the con 
As we have seen, t* " SfonJ^specified in each objective and the 

sistency between the cond^_^^ nreasured As the items written 

manner in which that I ^ representative of that set of 

to measure an oh]“ti ^ f^r (othenvise known as the 

performances that the o J appear becomes 

objective's domurn), the test 

more appropriate encouraged to engage in 

” Writers of appropriateness of 

domain referencing ' stability (See Chapters 11 and 13 for 

and to improve their int ^ tests— the byproduct of 

a discussion of criter^^ blem the item writer faces is th^ 
domain referencing ) specific that is, gi 

objectives, even good on • j,a„s ean usually be written 

obiective, a rather large . to write so that his or he 

How does the writer kno p„ateness and have clear inter 

resulting test will be hig ^ as„„n according to Hively e 
pretabiluy’ The answer to^\a of items that 

(1973) IS to define and^ of characteristics that 

might be written in term 

Items might have 



218 Test Appropnateness 


Box 8.2 


THE ITEM FORM 

ran ‘’“'9"“ an approach called the Item Form that 

The appropnate test Items for given objectives 

me o!,?nri'ra"; 'ba' provide the basis for construct- 
mg one cr more items tor a given objective These categories are 

1 response description 

2 content limits 

3 Item format 

4 criteria 

5 test directions 

6 sample test Item 

criteria to the crdon^parl Item formir* r"™ objective and 

“biective and the type of test Item to o“"b/l/ons part of the 

test Item are obvious in meanino Mnai^ Obb sample 

to construct is the conlenl^m/te^wh' f ’b® 'b™ 

of the conditions or givens that will h ^ ’''b °* instances 

bve Content limits speedy thl obieri^'””'^'’ b® “bbable for the objec- 
'PSPther yields the tolLingl^mpt b b ''™b.n Putting all the pLrta 

’ be able to add traoticnal expressions in 

2 Confanf liniits |^o 

numbers and fractions°w,ll”I,^ 'factional expressions, mixed 
”°"™'"™olSohcn''V?"“" bnd converted to 

= ot’JZ 'b"'" "r'’™" "bbbf"'- 

'-s.d=n^o~ ^«'-ss^ Add Ihem up 

b Sample (lim 4 “ l%+'y “ "‘'"'bbf *“"0 



What Makes Tests Inappropriate’ 


If we have an objective such as determining the volume of an 
irregular solid using a water displacement procedure we could 
devSop a statement of the item domain or item to™ by specifying 
in detml the characteristics of the irregular solid the 
provided the instructions given to the student *0 fo™ o h.s o 
^ j ftso different outcomes that the student 

this Kina or oeiaii the item writer can know and 

the performance wiU be 

precisely describe the popul^^^^^^^^^^ The Item writer 

that might “tentatively select items from this popu 

can tben randomly or p ^epre 

lation of Items m order t 

sentative of •>*“* ' rformance of students on those items 

appropriate and (2) th p performance on the whole 

“i.” 

others is a oonsiderab i,e expected of teachers What 

which can normally and tMchers will exercise care and 

is reasonable t° “P®' conditions of the objecitve so pat 

devote effort to f * ception of the domain from which 

they will have at 1=“' ‘ensitive to at least the major 

items may be ^ teachers can write items with sufficient 

dimensions of that pmam J interpreta 

UoSthey Sbied on them (See Appendix B ) 

WHAT MAKES TESTS 

trioWfc them appropnate then lack of 
If basing tests on to be inappropriate However lack 

an objective base “ s other than a lack of correspond 

of appropnpen^s 4, Consider three of the more impor 

tant ones ,uitable instruction on the objective 

(» rresPonCm -b- ‘b- •“ 

( 2 ) Students are 

,3, Thfles. item^ 

lengths 



220 Test Appropriateness 


Testing What 
Was Never 
Taught 


Hgure 8 4 


*^Wponse Sets 


fll. °f achievement testing we have "teaching for 

tne test and testing what was never taught " An appropriate 

™ o7cT‘ •“‘■"S f" what was taught (aLum- 

nLever^r has been geared to the objectives). 

whm ™stau»h^'’a‘^ independLt from 

rherieam,n„ . ' 'I" ^ ^PP'^ 

been privZ for ;natenal The learning musrhave 

must be new m that test ""u 

tice dunng instruction f®"" P'''“= 

a “‘echievImTurgrt^pnne Mme*'’ ® '"“y 

you spring too manv <iir^r. ^ surprises on your students If 

little to do with what ym taSht^ThT ^ 

8 <1 A test need not be mundaL il,^ ^ situation is shown in Figure 

advance what the test items will be° buMb 
ences should bear directlv on »», ' Jnstructional expen 

achievement test or it cannot bA' be covered in an 

are of what students shoiiW hove an appropriate meas 


fla/alio„ between Testing and Teach, ns 


New Objectives 

lESTAND INSTRUCTION 

New Items 

New Items 

tsame Objective 

Something eiro 

Appropnai^^ 

Part of Instruction 
(Practice) 


cia some kinds of tests st..a 

a' T" ■"“’■zing itTlh?"™^? ” “■ 

ansxserinp ^ desirah,!,, t ^^e doing sc 

““S'lor chmees" w'r- *“ «■" ““cal desLb? '’■" ^ 

things a'bou,°i‘tS '■"'"8 “ware of it’’mos,°^ *0 respectiv 

“'■■'“■’ychuaUoe.af'S^’a choices am 



What Makes Tests Inappropriate’ 221 


A third form of distortion is called acqtaesence This occurs 
when students respond with a pattern such as true false true false 
or true true-true , rather than actually attempting to identify 
the correct response choice in each and every case Such a response 
pattern is often prompted by disinterest or hostility or by test 
Items that are consistently too difficult to answer Students who 
have the tendency to overselect the true response choice are 
referred to as yeasayers 

Asch (1958) identiHed three kinds of distortions in reporting 
Distortions of perception occur when we actually see things differ 
entirdue to social pressure Distortions of ,udgment are instance 
when social pressures cause us to alter our responses from what 


Box 8 3 


^ Tue tpct" prevention appropriate 

1 1 « Office of Economic Opportunity was roundly criti 
The former contracting (an approach where an 

cized on its evaluation o P contracted to teach disadvantaged 

educational technology 1,355(1 on the reading and math gams 

students and was to be remun Texarkana because 

attained by these “noduced the actual (inal exam as a part 

the contractors had . ,|,eir second larger experiment OEO 

of their instructional P™^ „ «972) to have 

made arrangements rep 

the Identity of pretes ® “ , discover the identity of the test 

any attempt by termination ot the contract even 

would be sufficient , checked to make sure that 

The instructional content waa to b^ mstruction tests 

no testing "'bter'a' ^ 50 that there would be neither 

were to be adm n.swed y ^ ,n the 

Identification of " 55 „ be done by a responsible and 
evaluation of the od 

disinterested corpora ocedures insured that the ccntractors 

No question that t possibility was adequately safeguarded 

could no" teach .0 Jhe qsed was appropriate still remains ,0 

but the question of 



222 Test Appropriateness 


we actually heheve to be so because we have come to rely on the 
smfTr Distortions of action occur when social pres 

what weTr “> “nfom we consciously compromise 

r^es of ■mpress.on Such 

It mav or ma '™ '"*™duce bias into the answers to test items 

or J^g™ ® “^^“ffenng a distortion of self perception 


cultural experience vs actual intelligence 

when IS it a m^elsure oToto" IV '"'®"'Shnee (that Is mental ability) and 
" e test IS supposed tX :r,r:: -intorcement by a cultureT 

^avor of members of a particular />. is biased in 

‘ntelligence test is questionable ^PProprlatoness as an 

bias m many Insta'nc^s °usWmMj bt"'=lzed for cultural 

lhat culture might'tiaTOrte'TtpXTm^nc/'''''’ 

(1) Alley Apple 

a Brick Boogie Jugle 

Piece of fruit ® 

c Dog ^ Worthless 

d Horse c Old 

(2) Black Draught ^ Vl/e'> PUt together 

® Winter s coldwind *o a 

b Laxative ® Cotton farmer 

® Black soldier ^ Black 

d Dark beer ® Indian 

" Vietnamese citizen 

® A vampire 

h A dependent Individuat 
0 *n injured person 
° A brother Of color 

Ml *(5] 0) "" wmiamf""*™ "““sone'ly by per 

— Wns (1) a (2, j, |3j ^ 



Appropriateness and the Teacher 223 


Culture IS, among other things, a wavelength of communication 
Teachers communicate largely in their cultures to students whose 
cultures overlap to greater or lesser depes J''" " 

also reflect a culture, usually that of the test maker We may say 
that a test always measures two things a set of objectives, and 
an Ibillty to function m the culture m which it is written If stu 
dents arLn a different culture or on a different wavelength from 
that of the test, the test may lack appropriateness for them as a 
nf ifs ohiectives However, we may also say that the 
measure of Its J - ,^5, based on its representing its 

greater the aPP™?™'™? i.kehhood of its overall appropriate 
objectives, the a greater threat to the 

ness Hence, cultural J, .eL such as standard 

appropriateness o broad ^ .^aeher built tests Test 

ized achievement tests, t achievement tests to try to mini 

publishers f ^Lra^Iy biased items Teachers should be 

re'fu'l ^ofonT/to sat thefr own test items but their objectives 

Where achievement . 

tunity to learn (le, the culturally based opportunities to 

they may be ® i.ural bias By virtue of their generality, 

learn, thus infsoducing measuring outcomes that are tied in 
published urrence to cultural experience Items about 

their likelihood of occurre cultural 

sports, for example, may ^ accordingly unless their purpose is to 


tneir iixci.- . biased against gins res.s 

sports, for example, may , u„|ess their purpose is to 

bias lose -PP™P"S‘ed ,a culture 
measure outcomes r 

appropriateness and 

ateness most often considered by the teacher 

The kind of appropriateness m 

,s whether the test look j ., „3„u|iy credit enough impor 
ILndstomeasmeWithd^We^, 

TnSsmglV^VbSng a'fest feel that it is mappropnate this 

“,^e"ct*e«^heyrfspond.o.«^^ 

rSt ® "unfair," ™^,^e"trif pamnts feel that a test 

nicating their m.sgtv.ngs push to have the 

IS not appropriate, 
removed 


Cultural Bias 


THE TEACHER 



224 Test Appropriateness 


‘“Portant judge of a test's 
tZ 7„Trl ! " ‘l'= best judge We might say 

reallvfZ", ^ objeet.ves, ,s 

test <:beck the 

they are the desited onL available and whether 

interpret teacher-i test manuals are often difficult to 

her With'testma ex^L assistance of a staff mem 

Teachers should questior’tL*'^*’°°* for example ) 

achievemenue f,eX?talt‘“'“‘‘lV'’if ^PP^“Pn-'o™ss of any 
achievement test should renit JhM V’'T‘‘“^ A" 

have learned and thus presumahTv ^ 'vants a student to 

2 1 and 2 7 in Chapter 2 showmo^h ^^^^ht (Reexamine Figures 

vam.o the point b^rngmadelere =PProach-it is rele- 

mg and tests as fonows*' ‘'''“‘'“"^bip between objectives, teach 


'h intended 

(objectives) 







what is taught 


■ ^ 1 

> 


WHAT IS MEASURED 


(lessons) 



(achieved outcomes) 

■ 




extern »h->. is intende 

results have ProfiLd bv !nS. ?" you tl 

^hose that did have profite?^^^ 

prepared or im been mattenf mstructio: 

'«< resul,”"^ ' ■'='‘>10 to the S'"';' P“°-'‘y motivated, il 

^todenu omE toacW 1 ^ can also u* 

'’^‘'cheenatfauh ' .hrmst™aia"‘lS ma 





225 


Additional Information Sources 

Ebel, R L Obtaining and reporting evidence on content validity £du 
cational and ps}choIogtcal measurement, 1956 J6 269-282 
Lennon R T Assumptions underlying the use of content validity 
Educational and psschohgtcal measurement 1956 J6, 294-304 
Millman, J Passing scores and test lengths for domain referenced tests 
Review of Educational Research 1973 205-216 

Popham, W J and Baker, E Writing tests which measure objectives 
Englewood Cliffs NJ Prentice Hall 1973 



Test Appropriateness 


SetMest of Proficiency 


Itene™!! important consideration in determining a test’s appropri- 
a the vocabulary level of the test Hems 
c rheihf "f “ P'''‘='’°metric or edumelric 

Cchve' 

poye^shog Improved from pretesting to 

ness cl a given tret Item 'I'® appropriale- 

you Describe how 

Objectives ‘'“"i' "" ePPropriale test of these 

iotal approprmTenret 'usTa doTw'th “ 

a lest Item that measures a given nh"' ? 'Pifibele 

lives twice as important as the others‘°^ 

® 000^1,::::: ;,^ °"'"'"“^"e-eber "ams necessary to measure 

“I a unit teat in a high school h^* 'iia appropriateness 

alad questions lor applym” ler“?'''^* 'ba right are 

Ibe lelt with an appropnale „uret Match each item on 

a Pitsobjeotives ^ '"'Gallon on the right 
b Rellecis action verbs ' ““"b '.'’b ^"">01 of items reflect 
a utilizes conditions lie relative importance of each 

O Employs criteria unit’ 

Does each item reflect the "given” 

W Would Objective? 

simil ‘’b aullable lor a 

rrre?o™r""= --- -- 

what^h*^^ format appropriate for 
V Te tha 'b baked to do? 

Are the answers judged to be cor- 

''> “"a'mine the app,pp„„^ atatld m p~sv 

"ge-’S"'"'’ 

*nis chapter, listed on 



Self test of Proficiency 


(8) Domain-referencing requires as its primary prerequisite that 
a obieclives have been labeled as to cosnitrve or alleclive area 
b behavior to bo measured by a set ol test items has been specified 
c items have been classified as knowledge, comprehension, appli- 
cation, analysis, synthesis, or evaluation 
d Items have been written to be representative of that set ol per- 
formances an objective calls for ^ j u „ 

e the acceptable level ol performance has been defined in abso- 

lute terms 

(9) Domain-relerencing helps insure that the items chosen are not too 

difficult tor students to answer correctly 

TRUE false 

,10) Check those factors below that can affect a tests appropriateness 
a response bias 

b student motivation 

c conditions ol test administration 

d testing lor skills not in the objectives 

(11) You have just onroTlhrote^^^^ the resources 

the names, ,,e third graders in a small school in 

they provide Your s co„si,ucling your end-of-unit test and 

rural Tennessee Yutl a™ ^ Unlortunalely, some ol these items 
have written the ,hat are inappropriate on the right 

are inappropriate nmnriatenessontheleft 

with the reason for inapp P names of the 

a different cultural wave 

b testing what you neve ^ describing 

c acquiesence response feelings as you watch 

the sun rise over the ocean 

III Which ocean is the largest’ 

IV What IS the southernmost 
land mass on the planet 
Earth? 

V The weight of all the water 
in the oceans is 100 million 
tons (TRUE FALSE) 



objectives 


chapter nine/ Test Validity 


th=Snbetw«"'"J™ f PJ^d'ctive tests as 

2 W^nttfy tha relauve stze 

3 °-«:r XT'® 

valid, t/ TamTlyT' types of test 

tt construct and d enteno ’’ P‘''=‘>‘«‘ve 

^ ^cedetmX"''' ‘ypcs of validity 

can “scX“app'lyJ''^‘’'^'='J"cs that teachers 

0 dS™*' types of vahSity 

»S";iiSTS*--ivr,.; 

‘‘™ -ding level anrCTil:" 



WHAT IS TEST VALIDITY? 


The question of validity is different from the question of appropri 
ateness fe validity cannot be satisfied only on the basis of a te 

be true In other i,“^nalyU be enough It may be 

objectives may not, in the bn . ' | 5 to be sure that the 

neLssary to validate 

test can be used to P^^ic s 3 prerequisite If we 

which the course ® lonal gams (the edumetnc function), 

simply want to record * „3nt to predict subse 

rest validity re/ers 

to measure If a test s J whether the outcomes >t 

ematics, how are we t jitude’ How can we tell whether 

ures really raP''®®®"' , . a test of attitudes toward school really 

students who aa°« ‘"8^ ,J,ho score low’ Can we assume that 
like school more than th ,n,ell,gence test are really 

students who score h^iB questions, we “ay ““‘j 

than low aa°>-a«’ ^“for evaluating the outcomes of a test 

Arpast°\te%°res"nt and te^ aS,,, „f certain 

The objective of many tests i ^,^3 [,3, ^ n 

experiences that have achievement tests share this 

Xvement tests and f antodiz^ f^s “hley 

;"52S=SSS=S“S: 

future on the job 


Do Outcomes 
Reflect Purposes’ 



230 Test Validity 


lest Hetleci the Purposes for Which It Is Used 


Time Referent of 
Test Outcome 


Past 


Present 


Validity Question 


(1) Does the test content 

reflect what was intended 
to be taught’ 


(2) Do persons who have 

been taught do better on 
the test than persons not 
taught? 


H) Do persons who already 
show other evidence of 
the quality do better on 
etest than persons who 
do not? 


(2) scores on the test 
'®’'®'°'="’Sfquat,t,es 
wih which they are 


(’) °° scores on the lest 
Predict success in a 

sPPsequenl related 


' area? 


Type of Validity 


Appropriateness 

(content) 


Criterion 


Concurrent 


Construct 


Predictive 


Kind of Tests 


Achievement Tests 


Achievement Tests 
(Criterion referenced) 


Intelligence Tests 
Competency Tests 


Personality Tests 


Aptitude Tests 
Personnel Tests 
College Boards 


. I wvucye B oards 

ilarly use'tests T**** ^o^ipany Colle 

presence of P^^oses f? universities , 

[;« °r her future S 

somewhat on ournom ‘ 

'«‘S IS to predie, fu, “Pcnence the m 

th, some “S,''s ^ 

PPhievement motivation , 


What Is Test Validity? 231 


be found to volunteer for more bonus assi^ments ^ 

test ts a measure of “ch.eyemen. rnonvaUon 

The dt^eren. of t pr:ce“;rr for determ.mng the 

manzed in Figure 9 1 Ea t p j whether the 

degree to -^ich a 

test measures what it i , (Procedures for assessing a 

;«t??pTropri— ■" •'“= 

1- . imf a test whose validity has not been 

What happens when you us You may be using a 

demonstrated ^ e purpose, that is. you may be using 

test that IS unsuitable P relate to its purpose 

a test whose outcomes 5,^^^ conclusions 

Such a test does a diss characteristic of the test taker 

based on it may «""f°" n°end to measure Suppose for exam 

other than the one that you I t measure reading 

pie, you had ,, Lu noticed that a student whom 

ability and after y°“ ® ^ ^^der had gotten a low score 

you knew to be an ^alities other than reading ability 

L that the test reaching erronoous conclu 10ns 

If the test IS invalid, y°“ “ smdenfs ability but about all the 
not only about ‘^t P^o^ evidence that a test is meas 

rgwLttrtt%ays tt-e— machinists 

th":bats%VSr:fpre^aictio„s^am™^^^^^^ 
mk^’s subsequent success j^g skill than that required by 

issgssass 

the comparison e 
of validity 


Failure to 
Validate 



232 Test Validity 


intelligence lest tor anv'ifc''^ f ® performance on an 
factors may be based on tb ? i, matter) Some of these 

thing other than what it it mi ^ measure of some 

gence tests may he tests of “ measure of Intelh 

lectual experiences rather tb motivation or of prior intel 

some acceptable evidence o/th^ Without 
place and group with which u^ ^ 

‘5,'P^Pose of decision iLking “ 

other than those they 'themsehien '’n'ldatc tests 

=rs use a published Test aormtl ”!,'™"* when teach 

Jcy should be able to evatae iVit 

Its validity Thus teacbe •*’'= O'"! 

rahdate their own tests bm h^w , ™'y >’°w to 

onately most published tes, t o ^ P‘'’’''''’ed Lt For 

>n the test manual ^dence of their validity 

concurrent validity 

toeffiT” ^olationship between 

‘^“rrentvahd.if ‘'■“uss "''Pressed as a 

'■on coefficient^ Ftlf^' “"‘^^stand^how “"“P' “t eon 
«e Wll return othi '""® “ ‘’■^"ussiS oF t '“.'"""Fret correla 

“‘’’'"““““E'-ncuSm'vZtf™ 

‘"‘crpreting a corral ^ 

CeScR'nT, td",' +100 to -i nn 

'«rcs o„°oTe mt,S!“sS '*™'"^'>n th^twotS 



Scoreson TestB 


Concurrent Validity 233 


Sca»er Plots Showing Test Correlations 


Figure 9 2 



0 15 30 45 60 75 90 105 
Low High 

Scores onTeslA 
Correlation Coefficient 
r = l 00 


0 15 30 45 60 75 90 105 
Low 

Scores on TestA 
Correlation Coefficient 
rr=000 


15 30 45 60 75 90 10f 
Low H'Of’ 

Scoreson TestA 
Correlation Coefficient 
rs-l 00 


,„„elation coefficients A correlation 
graphically displays some correl „„ 

coefficient is displayed by ploU g ons on 

coordinate of the f eph and the seo The resulting set of points 

the second test on the o 

called a scatter plot, '"“a'™ sets of scores 

the correlation coefficient «'a f *„elation coefficients between 
Let us examine actu ^ q„ 3 ,.ty to ■l>aa*cate 

scores on tests intended to me establish 

what IS meant by a corr 
mg concurrent validity 

Test (PPVT, Pfatr“j^^perted with ^ test is also 

correlations of 0 . ^ scale (Dunn, 196 ) Verbal 

Stanford Binet laje ’ ® ^ median 'evel of 067 ^ (WISC) 

Srol‘be“w7chsler^^ 

r:^*rhrpm(perba,ms^ 

score on the Wechsler scale 1 

to the 0 67 above 




234 Test Validity 


figure 9 3 Correlavan of ,he Culturo Fair w,th Other Tesla * 



- re. The Ce. 


^3 The correIattons'bet^^^"j validity is shown in Figur 
raner“°h'‘"'='’'8'"“Tend,^^“ Fa,r Scale, and o*e 


° — numbers ) — * are local 

^ncurrent tp 

Validity and , ® manual reoortc k u 

scores on ,h 

eood-Sson fo.^:"" '■■gh «>ncu;;en, f, ^-gh cc 

reported between ^!I1® ' The reasn 'i'** “nstitute 

rests provide end ^“*re*re Fair Scalee"^^j^ “PPolattoi 

-'“re as “"“«ren 4hf, -'■‘<='l-gen. 

correlation Of *c«s as thev ^ It tends to co 

dence for ,ts con^!l^ Full s Jl! another Ii 

correlate higher However i ^ positive ev 

since It ic , ^ ^ t" 'VlSC-Perfnr^ 've would exnprt ,t t 

:«.si5;sH^"5=^S 

* ^ a nonverbal anH u questio 

t and hence “cultural fair 


' might ask 'U'l. 

re'-if'J^SgVerhe, “ -V reec 

"crmance have 


an inter 



Construct Validity 235 


CONSTRUCT VALIDITY 


L.ke concurrent vahd.ty construct ^ has ^ 

thef s™- ,o”.:tr,re can erpect 
test’s scores correlate wim between two tests 

to be related The correspon direct— both pre 

used in establishing quality For example, mental 

sumably measuring exact y S should be related be 

ability test #1 and mental abi hty t«t^ concur 

cause they purport to *“^^o*er'hand is an attempt 

rent validity Construct W l^ut conceptually connected 

to relate tests that , j^sts measure the concept or con 

qualities For example, ap which presumably would affect a 

struct of potential tor achiev^B. ^ ^^_,ue relating an 

student’s degree of achie |. 5 |.„,s an attempt to establish 

aptitude test to a njes between concurrent xalidity 

construct validity (Th Figure 94) 

and construct validity 414 >hu^ ,,a., would be to take scores 

Another example of construct u|u,,uyes the con 

from a test of ^oo'ab'h'y ^ ™ J students possessing more of 
struct sociability and ^ho'V ^judents possessing 

this construct will ug ,he number “f People w o 

less of this construct By the number reporting 

report to be fnends of high t 


r^nf and Construct Validities 

Illustrations of Concur 


Mental Ability 
Test#1 
Scores 


Figure 9 4 


Mental Ability 
Test #2 
Scores 


CONCURRENT \ 





236 TestValldlty 


Rgure 9^ Correlations between DAT Subtest Scores and METRO Reading 
Score 


DAT SCORE 


Reading 
Achievement 
Score (METRO) 


o 

E = 
a jQ 
2 < 


9. a 

® o 
S □: 


of the test can be 

Ilhistraiion 

Benneu e*t'aKT9^ Aplimde Tests (.DAT 

bojs between thejr scores on”^ teponed for 68 ninth grade 
Reading score dented from ®“htesl of the DAT and the 

Achie\ement Tests 

esceptforS^alSla“OTSM'dae“”rc “*' subtests, 

sent the components or constninP^^ Speed and Accuracy, repre- 
ing skills would seem to be based ™, ''hw^h the mastery of read 
we would expect some relationsh,'.^* «ception of those two 
tests and the Reading achietem™ “tth of the DAT sub- 

mla.ions closest to reraareTors^'J“te Note that the two cor- 
Mureoter, the DAT ““d Clencal Speed 

leh','.^ "><«< ciearly S T constLts 

eehietement are the Verba! tit'etopment of reading 

sS 

a, related to the METRO Relations Sub- 

’ 'Sc uouM expect ifcc i«t r achievement score 

®>niruct achievement 

"'■■iurof rheclrnk'ltfen^' correlalion of 



Construct Validity 


237 


shown in Figure 9 6 For Verbal Reasoning ^ ™ 

IS based on a smaller spread of space 

ReTauLf tre^ol“roPelat.on is based on - 


~ ssrsS's.i'r .5;;-“= 

of aptitudes and the METR R a^l Although correlations to a 
the construct validity of «' Reading achievement score do 

rttlf^wTo" ~lations do tel. us that reading 


Construct 
Validity and 
Correlations 





238 Test Validity 


achievement is a manifestatton of constructs such as Verbal Rea 
somng aptitude and Language (le, Sentences) aptitude Since 
these DAT subtests are intended to be measures of verba! and 
language aptitude, and since verbal and language aptitude con- 
structs presumably underlie reading skill, the reasonably high 
“d Sentences subtests and 
«n,« achievement score do tend to provide con- 

Struct validity for these subtests of the DAT 


PREDICTIVE VALIDITY 


scares Z 'he degree of correspoaderrce between 

shown in Figure 9 1 "’ensured by the test As 

The queslion is wheibe has a future time frame 

Picseltcan ^ 

Tests (DAT) u^sed^r'Illusirat”'’'' '^’ScTential Aptitude 

dictates that pertt reaie If logic 

should do better in courses related ^ “f,‘"Pde in a particular area 
mg that aptitude then scores on tb ' ri'J’-J ' persons lack 

vanous courses by identifying antimd^l^ 
ate partially dependent • In the DAT Mn I'lu 
a correlation between scores on ib m i"“^ (Bennett et al , 1966), 
attained at the end of eleventh vr “ hlechanical Reasoning subtest 
science course for 107 boys ,s renoned"'’ “ ‘"^'^'h year 

s ma^itude indicates that the n 1 1 ^ 58 A correlation of 

science grade (Correl *^1 reasonably pre 

^Sh for predictive validuy « are considered 

“"d acLraSv „ ^ '=^1 presumes to 

?=s ha?"'"?'* \P=rformance in an 

The n lu ''ahd.ty for ’’'f ‘ast (. e , the sub 

The DAT Manual also „ 'T'e of course 

between 

Ptades earned m English a seme ™ , at S^ade, 

• '’I' >« gmls The 


• Thu roaionin, u „ , girls T 

students taking i 



Predictive Validity 239 



SOTHCO 

ORlCItlAL 


„„„ „ good at tak ng apklodc testsi 

Your aptitude test shows yoa re very 9 

. further evidence of the pre 
maenltude of this the DAT Manual also reports 

Ticuve vahdtty of the DAT ’ Space Rela.tons subtest 

a correlation of 69 Itetween ^ training pro 

taken at the tune of adm'S of the program 

n/i orades obtained by aotitudes are predictive o 

d we assume that is correlated to the 

some Ptedintw^ uieasure 

presumably intends tom 

, ,o say that because a test t£t 

first The fact t ^„„hlv fifty percent overlap between 

clearly other facto 


Correlation 

Does Not Equal 
Causation 



240 TcstValtdity 


not ncccssanlj mean that the firw caused the second There is also 
me possibiht) that both the test score and tfic stibsequent per* 
formance haic had a third factor ar their priitian source of inllu- 
cnee For these reasons. «e sas lint cortrlnlimi ifoes not necfsiar. 
'ly imply consa/ion Correlation does impls relation but does not 
unequuocallj, idcnlifj causality. 

esns^r apiitmle in space relations 

related^ ’’eeausc the Isso arc cor- 

success lo succeed is more llhcl> the cause for 

aWe- hat an rossllile-pcrhaps prob- 

a sinat * “"‘I performance arc causalls related from 

■ntposs.hic to tell .shelter thS am or noT 
ea?s“ """• ■^“"elation that n relationship 


exislfbcu.ccn°ihc''mfanrmon f”"'’’’''' Forrclallon of .65 

amount of mono spent on '''= 

Ihe death of babies Muses itn.^s 

cause iho death of babies’ H it ° '’'eah up or lhal brol.cn roads 
'■keh IS ■haralhirf/ae.or'e^'”'''"'^ "•>«' '^em, more 

mdustrialiraiion or amount’of Lniu,?. 


mdustrialiraiion or amouni of Jili . 'emPeralurr or Icsci of 
mg, conlribulcs to both outcomes " r"'' " 

"P=>j^““smgthelsso tocorr5atc 

attendance and gradcs^coTObic'^eo'^T’ Suppose 

ence tn the classroom alone caus^ ^ i' “*'“me lhal pres- 

z .Hem Ho.h■^Toto^^re;,-;rd.^re^c-rd"^ 


Witerion validity 


Its taf e* at&Xr" 

effect of trainin ' T '.'Bure 9 1 1''^ u°' “ 

test TI, ' rnmne already rece.t,.j ... is in the past st 

fica.™ off "«.™ tor .h,™t ""8 used ,o"c.‘a,u 
already that is n based on the 



Criterion Validity 


Let us rctum to the DAT Uamicd for an example of 

occupational groups ( identified with the occupation or 

groups experienced or beca and science majors in 

education ) The high scores ^ liberal arts majors on Space 
Numerical Ability, the low ^o relatively greater 

Relations and Mechanical „„ Verbal Reasoning 

E;"rdsUhed tradesmen support the criterion validity of 
‘■'^relondillustrationo™^^^^^^^^ 

to the classroom teacher Ij,™ represent an untrained 

tion on a particular set o object they constitute 

group After such instruc i nroficiency on the given objectives 
a trained group If 1 '°“'’ ,5 Should score lower on it when un 

has criterion validity student ^ pretest) than 

trained (le m^n given as a pos.tes.) 

when trained (le as both the untrained and 

In this case the same g ruction intervening The post 

trained group with tr reference to the pretest 

test s validity is evaluated witn a ,est with respect to 

Criterion validity a test has been developed it can be 

its obiectives or content A £ presumed content sp 

ubm ned to a panel of experts ^ ,„d validate the con 

"whose op.n.-s se^rve as the conmnt of *= 

tent coverage of the tes acceptance of its contcn y 

dinator or sf 1=^. matte^^ S of experts 

m"sfud»tV P^%„t.t^o^g:"aSl.y mnks in the appropriate 
should be drawn r .mbtests to discriminate by field 

“=s!~r.fsrr.%.- 

related to their 



242 Test Validity 


subject matter If the purpose of the test is to measure students' 
preparedness for jobs, then the panel of experts should be drawn 
from employers and employees m the specific occupational area 
In a sense both of these latter types of groups (i e , college profes 
sors and representatives from an occupational area) can attest to 
the validity of the content of a test as a measure of the kinds of 
skills and knowledge that is prerequisite to entry into the respec 
live institution of each (i e , coUege and the job) The essence of 
nXn 7 “ described here, then, is not that the test fits a 

Id Ik' "’“I the test fits the situation for 

ttdlhdZd 

the matte^of '"‘’“*'‘5' ™'' This is 

obtamed hil s? themselves 

quit ™mdar 1^“," tiiuLts This is 

the terms "predrctive vahStv"' 'an^™”'’^* 
mously If those who wen. ^ validity" synony 

succeeding in higher educano'* scorers on the test are now 
cntenonvalidit^ education or on the job, then the test has 


DETERMIMing a TEST’S VALIDITY 


dictive, an'dmTcSiT f ° ’ “"“Tent, construct, i 

W-jle the preceding fetSatuT 

•mderstandmg test validity this sec? ussist you 

‘"de^^uimgtestvahdit/' " “ intended to assist J 

eppl.es '“’’the daeiVuon oTSt^^^^^ •' 

'=bly Test VALID’ ‘‘““Y ‘s shown below 

» Do students™™ are mr" ’’"'°™euce Levels 

ter in the test area perfom'if ■'“'^Sed to perform 1 
’’ Do different stuZ^ ” the test’ 

perform differently on the van''"'"' ‘"‘'Srccs of experie: 

-s;orwh.ehthe^;^\-t',-^um.^^_ 



Determining a Test's Validity 243 


b Do students who receive appropnate teaching perform 
better on the test than untaught students (or does a student 
perform better on the test after traming than before)v 

(3) How Do My Colleagues View the Coverage , i , 

u Do my colleagues in the topic area or at the grade lete 

agree that all Necessary objectives and no unnecessary 

have been measuring the 

b Do they agree that the items aic 

objectn es’ 

(4) Does It Measure Something Other than Reading Level or Life 
ft: the demands it mahes on reading shdl within students 

h “teto^ance independent of group membership or any 
other socitKithnic variable’ 

, ,^,ts validity is to ask yourself AbiUtyofthe 
One approach to determining ® the characlenstic Test to 

whether you have some other « jf test is one way Discriminate 

or skill presumably ^ another way independent of 

to assess performance and yov ^^hile there might be 

the test, will the two ^;™’ndent source of judgment if it 

some shortcomings m this i P information as 

represents a second way he external results would 

your test and the tvvo ® test The 

reinforce and provide ^ J,,ty! would be small 

both are invalid P°“'t writing a list of 

that this approach the tvvo to see whether > 

constructmg a test and «»P^at one ordinarily ^ jj® 

,s measuring your ask a judge the . „,./abil 

thts approach to fam.har with ' 

test the or °es^ represcts 

r ‘Trfdrmeiif, -/.--firs ,- 

an attempt to the lest « ‘■“'^^,eimreiit la/.dirj 


the level of a students CO -—r 

distinguish between more 



244 TestVaUdlty 


in the target area’ Does it discnramate between good and poor 
performance’ Opimons of other teachers (or of yourself), of stu 
dents, or others who see or have seen the student perform in elec 
tronics could be used as a means of independently^ assessing the 
competent level of the student in the target area by using a simple 
scale that has at one end the student is very competent, and at the 
ar7i!l a u competent The numbers 1-10 

mLes , te averaged across 

^ J-'dgment of the student’s 

to his or he judgments for each student are then compared 

to his or her performance on the competency test 

ab.hfy toTcr^rn,!" ““I’''!'""® “ <‘==‘'5 validity based on its 
This procedurfa m > ^ classified as criterion validity 

ferenrexpene" ^ companson among individuals of d^ 
individuals with ereater rl\^ valid, those 

on the test than those should perform better 

reletant expenence the cnte.^ relevant experience Judgments of 
the students themselves usinv variable, can be obtained from 
above For example to validf > .v" ^ desenbed 

ask each student to’ rate his or her *”* a'actronics you would 
(based on being a ham radi ^ experience m electronics 
Of related e,„,pme„T,tn TtX sirh\l"^ 

'l ! ! 1 ^ ’ B 9 .0 


NONE 


' ' I I I 

experience IN ELECTRONICS 


MUCH 


stntotf s?ore®on° te'’,e"rr T" oompared to ea 

pcrfi™"'“ ^vfcrdrf eaTf 

a no cS?n™ “"^"■''''■<*“01 ■temtn ^ “oms T 

'v™s of us abUity to disc ™ ° 'oat mst. 

’“Indepcndoitly* me ^ between perfomiar 

y means uithout fii^i , 

"'’'seeing the scores 



Determining a Test s Validity 


245 


le^cls as judged bj mdependent 

.emune .he ^ahd.tj of .he ‘“‘^^^Ttudents scores on 
i.ems by dCermining ludges and con.mumg 

.he .o.al .es. and .he globj;! - ^ Us'Scen. ■" 
to use the test if its o\eraIl \a!idit> ie\ei 


Recall .ha. prcdtcU\ e i can toemme a future out 

to or forecast a te .o you can compare test scores 

come that your test ^ .ed relat.onsh.p occurs To con 

to this outcome to see if * J" c competency test may be 

tinue with the electronics example y^^ „„ 

expected to predict ultimate an electronics job upon 

next week's perfomance examination 

graduation fjrs to the relationship of two different 

Comtruct va/.d-O ‘ c„ed to be related If the teacher s 

tests whose conceptual bases s matter area is 

midterm or final «=‘"’'7‘'^"cd"zed Lhiexemen. test in the same 
one of the two tests a stand d i,„owledge could be the 

global area or of Correlating scores on the two 

substantiaffy 

merit or tne a js much more someone has 

- 

ers have come o „retest is different from a 

‘‘“It IS “eT^rway'of finding out whether students 

diagnostic test T e H,fferent types ot correlation coeffi 

'■£-2SSi^3SSi:;S'= 

ally by plotting 
237 


Ability of the 
Test to Relate 
to an External 
Standard 



Test Validity 



® P’ws z equalled 9! 

>° -.e. .he pan.cuh 
IS 10 find out whclhcr students have 1 P^ipose of a prete 

■ns to teach before has, ne had „ ^ ■““■"’P 

n por„„, f„,„, “ There ts n difference /I pretest 

[““■'■ "'c mi \vh,lf a of mstmctic 

m .™,on ■•‘l<«sno.heIp;„u i"''™= >«' may con.nbu.e 1 
If i ^ hetermine whether your achiev 

before your st, 

thee ha™ T ? nnd ( 3 ) *= ■nstmmen. . 

not bt-cn h r lonn '*? ^^^rnate form) aftt 
"<ll ixflcct totally inerrwtivp f expenenc 

■hm on7he nfr fm- theS“ T'’'"' '“t ts valid 
inicnded tole^c'h' nssume that eou h™'"’’ Postlei 
"-ach and tha. throueh Veaeh^'" '““8'” 

sn teaching, some rcasonabl 



Determining a Test s Validity 


247 


degree of learn, ng has 
^:sn:\n;aX:a“reye°absenc^^^^^^^^ 

of the total ineffectiveness ° ™ effective) However it post 

basis for considering , pretest’* scores then you 

test scores are substantially igh t^ ,,, ^ruction was reasonably 

effective, but also (2) ‘hat technique how 

this effectiveness and hence is answers to your 

ever, you must be sure that yo p ypp must cover your 

A third way to deal «'*>' ”pher) ,s“to sXit the 
practical one for the classroom r.gh, 
of colleagues and others ‘" f, ‘paction and ( 2 ) the right items 
objectives have tee" ohosen represents a way of “*"’8^°" 

chosen for testing (^tis app j.spussed m Chapter 8 ) While 
tent validity or approP""'®"' a ypur objectives the objectives 
the ilems on your test mey f grade level and subject 

SW?i=l3£ 

iiissSss 

expert judgmen s nrocess They can 

-Satets too 
meet and elicit eac 

Utt pretest - ^ 


Test Validity 
through 
Consensus by 
Colleagues 



248 


Test Validity 


can ask each other * Are these the right objectives and right test 
Items for this age group’ For these particular students’ For this 
stage in the learning process’ Do the test items adequately reflect 
the objectives or the domains within which the objectives fall’" It 
may be very difficult to answer these kinds of questions if you have 
teachers from only one grade level or from only a single discipline, 
you are bound to find conditions under which the same objective 
or same test item may be suitable for more than one grade level or 
lor more than one discipline A review of techniques for determin 
ing agreement among judges is provided by Light (1973) 


Invalidity Based 
on Reading 
Level and 
Other 
Biases 


“udentW ”'17 ‘h‘=y measure a 

Z'Z are^n.ended to 

wnter (e b reading level of the 

(eg. Student) reading level of the user 

onstmrknow:edgrbrprowd?nE 'f' 

it they know it AlthouBh then. ^ railed for even 

students reading levelf and thel^ nh w relationship between 
■I'm .5 being tested 10 . 7,1^ '? knowledge 

should most properly be called”""^j 'ry®* 

subject matte? aSmeni Tn , "°t a teat of 

good rule la to use small words a?d sh?r'i“' ‘'‘=“‘*‘"8 “ 

'ungcr a Word or a sentence tb. ®™<“res In general the 

understand ■ ““re difficult it is to read and 

bias *“rha5letmer«“riS’no‘!."°r° ‘"‘"^“e other forms of 
saine answer on every item Th checking the 

aludcnl might adopt tins resnonsp ™"ous reasons why a 
"U> reading the i.em or hos^d? a a ™ '*'= 'he is sleepy Ld 
l|kes Iha word ,n,e muSene, fh™' or 

tncapc this kind of bia-; n " does not know 

ZZ On^“ - ‘rst fails to 

siudcnls thus 




\yav, 4 I.CSI iw 

'luacnt^ It, --.-V4 4i lest that ° acquiescence re 

om rccarH their lendr* *'^^^resting and involves 

regard lo the items th?mSv« answers with 

app.,eu to .„.s i„ 



Additional Information Sources 


Amcncan Educat.onal Research Assoc.at.on f 

Assoc.at.on, Nat.onal Counc.l on Measurement >" ““cat.o 

S^durds }or cd,.ca„o„cl and PS, tests Wash DC 
Amencan Psychological Assoc.at.on 1974 Proeeedmgs of 

Cronbach L J Validation of e u Xestmg Problems Princeton 
the 1969 Invitational Conference on S 

NJ Educational Testing Thorndike (Ed ) Educational 

cronbach, L J Test -Wat.on In R L ^unc. on Educa 

measurement, 2nd ed > 

tion. 1971. Chap 14 . m R L Ebel (Ed ) Hnc>-c/opedia 

Cureton E E N Y Macm.llan 1969 785-804 

of Cducauonal j ,„easore,»e«t Rehob, hly aod vc/.d.ty 

Dick. W & Hagerty N Topjes in 
N Y McGraw HiH 1971 



2S0 Test Validity 


SelMesl of Proficiency 


(1) Test validity IS concerned with 
a the accuracy of a test 

b the extent to which a test measures what It Is Intended to meas- 
ure 

c improving a student's ability to score well on tests, 
d the meaning of a test’s scores 
e none of the above 


(2) To determine the validity of a test. It is necessary to apply 

(internal, external, academic) standards. 

A and n about the relationship between variables 

A and B based on the scatter plot below? 



a their correlation Is negative. 
b their correlation is high, 
c the predictability of one 
given the other Is great 
d all of the above 
a none of the above 



Y 





Self test of Proficiency 


(5) Match the type 
a concurrent 
b predictive 
c construct 
d criterion 


of validity at left with its definition at right 

/ relation between scores on different 
tests of related variables 
{/ relation between scores on different 
tests of the same variable 
/// extent of differentiation between 
scores of a trained group and scores 
of an untrained group 
,v relation between scores on the same 
test taken twice 

V relalion between test scores and 
future outcomes 


lined ^mnletina a chemical technology training 

a Students who chemical Technology Occupa- 

ccmprencrixaminalion than sludenls who had no. had 

such training Sociability Scale are found to 

t, students ® nominations than those who score low 

receive more between scores on the Gales- 

o A positive the Iowa Silent Reading Tests 

McGinilie Reading ^ ,be cognitive Abilities Test taken 

d Students who '"5" graduate from high school with a 

,y, Vou have ,US. cons.^-d a^te. - 

□escribe m one se ,af|on validity of your test 

drctive validity, and 0 

H a test to measure students' attitudes to- 
rs) YOU have lust " 3 a„,e„ce each how you would deter- 

® Ld school °«-^„:::,ryandh cons, rue vandinr 

° g have been asked by the head of the 

;:icri 

nnning Us abi UY p„„ciency 

(10, Describe two 
in this book 



chapter ten /Test 
Reliability 


OBJECTIVES I Define reliabiUly a test cntenon as test accu 
racy or consistency over time and items 

2 Identify the standard error of measurement and 
Its relationship to test reliability 

3 Identify and contrast five different types of test 
reUabiluy namely a Kuder Richardson For 
mula 21 b parallel item agreement c split half 
(including the Spearman Brown Formula) d al 
temale forms and e test retest 

4 Name overall sources of test variability and 
sources of error and give examples of each 

5 Idenlify ways of building reliability into a test 
namely using a sufficient number of items tar 
geung controlling conditions of test admimstra 
tion controlling for general or specific skills 
setting intermediate levels of difficulty and fol 
lowing item writing rules 



6 Describe four checklist procedures that ‘eachers 
can use to determine and improve the reliability 
of a test, namely a determining parallel item 
agreement, h using item analysis, c 
student response patterns, and d improving 
liability of scoring 


NO MEASUREMENT IS PERFECT 


No measurement '"=‘™^" Q,i„eter nor a human device such as 

mechanical device ^ perfect renection of the 

a test gives a result ,he same thing twice under the 

being measured If you ,he cruel same measurement 

same conditions you "“3 J'raJ measuring procedure the 

each time However, 8''’“ “ “ ,„,ce would probably be so 

values you would get it y ^ Measure the length 

dose that you could now do it a second time, the 

of this page with a twelve inch ™l^^^ ^ ,3,3 pleura e 

result IS probably ‘\=™tues you get if you measure twice will 

measuring procedure, the v due y ^ j f 

not be as close Try nteasurinS^t^^ lengths 

ruler Does the page measu 

each time’ property of an object or person, you 

When you measure some p P ’ j „f ,hat property In 

are "tmg f Tr^page, y^u t 

measuring the 1^08* ^ * f, ®,hat is, its true length 


Average would^e ao„..-^^^^ 

-fsra&nt byu-^^^^^^^^^ .,3 .roe length 

even so, the ine«“ 


°''“u nd be exactly the same achievement o, 

learning The t learning score „,_3 .est How 

t%Se^'rnns;-"^ only how much ihe 

ever, because the 


253 



254 Test Rebability 


student has learned but also how accurate a measuring instrument 
the test is the score for each student is a less than perfect measure 
of how much he or she has learned 

Because teachers do not want to come to conclusions about 
student performance (or anything else for that matter) based on 
the scores of inaccurate tests, they want to build tests that are as 
PO^'We But since no test can be absolutely perfect, it 
muTrS '‘"“'"how accurate a test is in order to know how 
racv tlip T E To designate a test's accu 

to which a t Reliability indicates the degree 

meatrlL d‘ “ ■" '"hatever it does 

time after time '"h'pb the test measures the same thing 

<^ons,stency over time and 
nemsts baste to the concept o/ rehabthty 

student more'^than once 'ih ^ students, or to the same 

scores (that is the score’ variability or variance m the 

of this vanabihw's a tniel^ T,"" ‘hey differ) Part 

ored part reflects error in 1^1'?^'!"?!,"’ PPOPatty being meas 

■he relationship of error .rtt rest 

JorS "a:^“'“oU 'r-' 'he mformat.on we 

eurement when we use the '"hat we get 

■o this noise or inaccuriv At th!“* contribute 

<0 only these noise factors , ha, ’"Tv",' "estnet ourselves 
degree each tune the is cl " he present to some 

fee tors ,ha, are withmThe testTolr to noise or error 

f^'hat error by determininE thl« ^ We evaluate the magnitude 
The standard error of Suren, °l •neasurement 

person’s true score on ?e« a between a 

■M 'tS"and^"“|'"S ‘he'tro“meas"uremS“™® “ r®*" '“®“’ 

a"'ou„,,o,vh,ch'’fh™T^““" ‘he toor“e'TheT“ 

f the true scire lh"'“? “a the M re?'”™' ‘he ‘est 
how close each meal™ J’"' "'a"dard erm “ representative 
measured score is to the T ^ "'easurement tells 
me true score and hence how 



Five Procedures for Assessing Reliability 255 


accurate the test can be Obviously the larger the standard error of 

measurement, the /c55 accurate the test cmrps 

As we have said, though, all the vanance in a set of scores 
cannot be considered error Some of it is 

the fact that students will actually vary on the property being 
mlasured Though it is obv.ou^y unportam m h^ve a _ oj 

a test’s accuracy It is ^ °^"™Vs,a„darf error of Lasure 
to give a test 100 times ,^“',rexpressed not as the 

ment Instead, test accuracy ^ coefficient The 

standard error of noriioii of the variance m test scores 

reliability coefficient is that p ^„„rurement It is, in fact one 
that IS not the result of Luare of the standard error of 

minus the error variance (t q variance ■ As we in 

measurement) / ,o total variance, we decrease relia 

crease error variance relative t reliability The goal of the 

fSXetrm decrease the error in the test, that is 

rqtsr vve " 

determined and how can it be imp 


five procedures for assessing reiiabilitv 


, , content outline (page 68) for which 
It IS not difficult to •"’^6'" /eppropnate, although few teachers 
any of 100 tests would be app P rnany Most 

would welcome the tas ^ situation But if you 

;Td"^elopefirtS^%^^^^^ 

case of the single test ^vere given 100 times Tha is 

Sow do yoXVwThTthe 

’’“'"‘‘^pSybeingmeasured ^^rpres«^ 


and consistency of a 






2S6 Test Reliability 


Kuder Richardson 
Formula 21 


of true variability in the total variability of test scores In Cron 
bachs (1970) terms reliability can thus be thought of as the ratio 
of the signal to the combination of signal plus noise The more 
no'se (or error variance) a test picks up the less reliable it is 
The impracticality of giving 100 tests for the same content 
outline or the same lest 100 times has led to the development 
other procedures for assessing the reliability of a test Each of 
Wbmating the probability 
acma^^arntfr'^t ‘.f true score (or 

lakei Five " s treasured quality) possessed by the test 
bldesc s reliability will 

coeffieient a^, ‘’'f °f "^ich utilize the correlation 

coemcient as a statistical indicator of reliability 

!oX™m“ThartTs®.fTu ? administrations of a test 

■ng Lch Item aT! tS If <=°"*-der 

degree of agreement then vo'n'r ^ ® 

or consistent measure fv^^i P^’^sume the test is an accurate 

a consistent mTasu^of I “ ‘he test is 

of validity But whatever that test i^me ® 

agreement on students item c/. * •’Measuring if a high degree of 

■nent or correlations a^nglteiST'""”'"® 

o a test It IS shown below 

Reliability=l-^ 

X re 

™ hy the class r 

tandard deviation— a measure scores (or 

■•co.Jm“?h™e's.^ tecr .fe 3bTr ^f 

Published free test r J] ^ ^ perfect reliability 



Five Procedures for Assessing Reliability 


«hcn based on the agreement among test 60 

built tests arc usuallj considered adequate uith reliabilities of 

be scored in this manner if K R crores 

class of 10 students has obtained the following scores 

12 9 


11 

10 

9 

9 


AA ,in to 80 and since there are 10 students the 

Since these scores add up to 80 ^ standard 

mean score would ,s obtained by subtracting 

tTmd';vilV!::t“orefrom.he_^^^^^^^^^^^ 

8r.n'th!s «?m1der and divtdmi by the number of scores to give 
a variance (of 8 2 m this example;^ ^ ^ , 

If we put these ^alaes <^nsidered adequate for 

with a reliability of 67 vvh^ You can see that as n 

a teacher built test with as few a ,he reliability would 

the number of items m the same) since 

increase also (assuming j becomes smaller 

L fraction to be ,,der to make a pom. Con 

Let us try °"= ®° on the 12 item test 

sider the following 

12 II 

12 10 

11 10 

11 10 

r II 0 and a variance (s*) of 0 6 put 
^ scores have a mean of ^ Ids a reliability 

appear m tnea 


i Reliability 



Frequency of Occurrence 


258 Test Reliability 


Why does the first pattern (or as we say in technical terms, 
disiribution) yield a much higher reliability than the second, at 
least m terns of the KR21 formula’ The answer is important for 
of reliability The K R 21 formula is based on 
scores iw " “/ e''‘“ ‘‘tstribution of test 

the center Ft^ ^ ® greatest frequency of scores occurring in 
decrea™ fr •''e mean, and a progressively 

distributiL , ’ "'“''es to the extremes This 

Distnbutlora'sW^ ^‘E“re 10 1 

KR 21 reliability wL''ct ‘"S ribuf" ’'b*' 

example havina essenuell ‘"°“Oon b comes from the second 
21 foLula DiftXuof J ■" ••^■■ms of the K R 

tribution representinc the version of a normal dis 

population As the obfaineH^!?*?^!!*'™ scores in the total 

normal or presumably true score^d*7"l,°^ approaches the 
proaches I 00 distribution the reliability ap 




Five Procedures for Assessing Reliability 259 


If, however, the instruction has been successful it is not 
totally unlikely that a distribution such as b will result Does this 

reliability 


Parallel item reliability ^ jegLnt of the class is expected to 
on which proficiency by a la ,^5,3, parallel Hem 

be demonstrated (that IS = of consistency of perform 

rcl, ability is based on the dele ended to measure the 

once by students across items mat 

sameobieclive ^,,3, ,|,e 12 ilem test referred to 

Let us say .“^^Lre performance on 4 objectives and 

earlier is an attempt to mess P^^^ objective In essence then 
so 3 Items have been considered lo be a measure of the 

each Item in a 3 item ^ct “n 0 3 33 , has been 

Sme thing-the hav= acquired proficiency on that 

written to measure If stucie not ac 

objective they should gc^ ^,,3,0 they should get all 3 items 
quired proficiency on tha randomly sampled 

rom^thesamel^ 

wrong th® reliability assessed by simply counting 

”'’^ptl:ii"i”»/^':tma“c"e otlbHrstude^nls on .he 12 .mm 

Let us examine the ^res has been laid out in 


■xamme the has been laid out m figure lu ^ 

The array of Item "here considerable consistency of 

= see from the The only exception .0 this is 

an see each j ucui _ students 

iem^nm the set "^r^S W^sTudents got items 4 and 5 right 
,ot.ten.6wr°"® » 


Parallel item 
Reliability 



260 Test Reliability 


Figure 10 2 An Array of Item Scores by 10 Students on a 12-Item Test (X's Indi- 
cate incorrect responses, blanks indicate correct responses) 


Students/ 

Items 

1 

Obj 1 

1 2 3 

Obj 2 

4 5 6 

Obi 3 

7 8 9 

Obj 4 

10 11 12 



X 


X 




X 




X 










X 





X 


X 

9 


X 



10 

X 

X 







tamed On thfs 2 had been at 

■tains 4 and 5 and should be “wutS'r ' """I ‘ '1“' 
be safe to conclude that th<. i-> Given this change, it would 
reliab.luy ">= test had high parallel item 


Split half 
Reliability 


bil'ty Sp/„ ?" ‘S split half reh 

’^‘'’p'nimis on each hall nl ^ ‘^“ivalence of perfori 
Xe„ ‘h ^P'“ It must be pointe 

Srss-' " 

rel.ab.hty, a test 

o^s * 'Pt '''ennuis.,"’””® *P= n-mbere 

tuo half “efficient that results fr 

“res hnwescr. describes thI„ru,‘'°"'P“”'‘t 

the reliability of only ha 



Five Procedures for Assessing Reliability 


261 


of the test rather than the whole test Since the test user vv^l be 
using the whole test it is that reliability “ht* 
calculate the reliability of the whole test given he split half relia 
bihty the Spearman Bmn„ Formula given below can be used 


reliability of _ 
total test 


2XreIiabilitvof half test 
1 +reliability of half test 


the reliabilities of published tests re 
In many cases one of .,-b,htv that has been cor 

ported m test manuals is a sp . Teachers however maj 

reeled by ‘he Spearman Brown Fo™ubJeache^_^^^ 

find the parallel item reliability P 

ceding section more useful since their tests 

criterion referenced type 


Another kind of ^9“,“’“'''"“ 

reliability Alternate forms reh substitutable meas 

of scores on t\io tests ,est publishers publish two 

ures of the same "“"f Some ™ , 3 „d form 2 or a long 

forms of a test-Form A F°;” „„dents takes both forms 
form and a short fo™, “ * « . Jely correspond the test can be 
and their scores on eeehl . ^ j ,y you can then use the forms 
said to have alternate f;’™' ^ ,„p„rtant feature if you are going 
interchangeably a ™wth (that is lo compare scores 

to use them to measure P P intervening) 

on the two tests with time or 

. „„ .mnlies IS determined by giving 

Test retest reliability asthena P „f„ch 

ItudeS the same test r'^;Jy^J„.„,strat,ons of a test corre 
students scores on each f hallmark of reliability a test that 
spend Since of two administrations with no s.gni 

,,, ,,„test 

ri xt fi-- ■‘fBreen"o';sTar:ms;m 

’^ reduce h* «h=>h'' ‘X7,7g.een on two different occasions 
aTerse:ion)Wh-:;;7tu„fuestomflu«^^^ 

these factors have I 


Alternate forms 
Reliability 


Test retest 
Reliability 



262 Test Reliability 


over different error sources may be operating each time), increas 
mg the assessment of a test s unreliability and making test retest 
reliability a severe estimate of reliability 

The third shortcoming to test retest reliability is that its detei 

UnIfesX h alternate forms reliabihty) 

a s tnl , , rehab, hty whtch only requ.re 

L, retest out an^vay. 

>es, tag sessions ^LTs^onUtSlpar'”'’" 

determinToal 

this approach telk «»: e.K ? unfortunate because 

thannsZemaTL^'f “ cons.steney over „me rather 
ststency ^erlTsee ‘ r?' “ P^-^^^nres do S.nce con 

use of test retest rehahtln^^*'^ concept of reliability, more 
approach is used lower rehawltv 

because of the ahove mentioned Ih^ommgr' 


SOURCES OF TEST VARIABILITY 


Overall Sources 


scores that doefnm rMuirf'’°"'°'’ variation 

Consequently ‘t is worthwhile exam,™™ measurement 

ry to distinguish between true "‘"g ''friability m test scores 
question and variability that re^» variability on the variable in 
reliability Knowing the sources o?^'^ contributes to test un 
p - to reduce them and hence tmpm^Vst Sduy”“‘ 

t“™i’re iot of test vartab.hty, 

Ihe test ^ urdividiiul includes ts’ rrud general charac 

orhe netlU'’™ "■'= - at emmnf r™' of 

reading teS'fr^'vrtstics he or she mL b '"■* 

characlj'snc uF is’™”'’''- 'aeh attemm "Pflude, and 

"'tr'ch attempt to m ""’"'''•“al However on TT'‘™ 
acteristics rnne. ? specific charo’,°" achievement tests 

'\e arc tr\intr t * ^**^^*®“*^® of measure ^ ^*"*^**^5, general char 
ba?ed on a "’uch a 

*o confuse iho ability such a learned varia 

"re issue However,tH"*--dmg skill, serves only 
eltcct of general abilities 



Sources of Test Variability 263 


on a test of spec.fic sk.lis - 
others, It tv ill alTcct a test's reliability 


Sources of Test Variability 


Figure 10.3 


Last, ngand genera, character, St, os o. the ind... deal 

IJ! « to'CreLd , 0 . 10 ., ons tes,w,seness tech- 

ntques of taking tests General type presented in this test 

(3) Abihty Id solve prcb eras oMh^g^^ operating ,n 

(4) Athludes, (e g , sell-contidence) 

s,tua,lonl,hethetests 

Lasting and specihc character, s particular problems in the lest 

(1) Knowledge and skills reguire^ p,'^l,abils related to particular test 
,2) Altitudes, places brought to mind by an inquiry 

abou?such^'earson a perso^ (pyrtamati- 

Temporary and tests at a particular time) 

Characleristics-a9’ f 

PA'®"’ p specitio sharaoteristics onhe m particular lest 

y Temporary an ^P ™ faihire on a particular item) 

"eT'"-""™^:s.',or cS-A- r 

srn7ol habits, etc, related to 

(5) Temporary ,a p a question calls to mind a recent 

particular les 

‘"“"’’me selection 0, answers by guessing 

(6) Luck in t 


■ n ihe selc^' 

L Thomdike. personnet Setae, ion 1949 by pemuss.on 



264 Test Reliability 


CrrorS 


\Ve have ail known people who were good test takers This 
Lfipi "ot an objective of any specific test 

look ntlipr^ ^ perform well on almost every test they 
noorlv on i/'t''*' ^ test anxious and charactenstically perform 
r conien.™" ® proficiency on 

chailc ensncs Thus lasting and general 

that reduce hVnr ™ “> and others 

Serihere a W™? ■'* ■■'“="‘>ed purposes 

Wdiiel that often rela'te Jo*soe"c‘!fi'ifT'^‘' characterisncs of the tndi 
automobiles are ^ items on a test An item m which 

closely attended to by°Lr'’Sife'^i'b 

Vtould favor the athlei.riti i* while items relating to sports 
specific chaVacterll^tic' con^ r"* general abilities 

bility because they reduce rehah I, sources of varia 

characteristics such as achtevem * ^ on tests of specific 

represent the goal or objective of Thi specific characteristics 
Temporarv measurement 

dude those thi are operatmetrir'"'^"" individual in 

■s tdten Eteiybody hLTbad dl,’’* P^d-'^nlnr time when the test 
take a test on such a day A recen?'/"'\.'* nnfortunate to have to 
or a failure in the heating svstpm family for example 

'«« performance Suchchlracter^,'" 

of 0 test s lariability that constnm. ‘o those portions 

=y do not relate to the true var rocasurement since 

Fmall .he ''“ability in the property being 

;utfn,d.L\feTooTlu:n'^'^s:il^^^^^ ’^hurae, ensues of the 

no 'ser„”.h‘e"‘ colceSf ^ 

the murmurinE of rii*. . j ^^mting because of street 
These IOC reduce rehab, if.;' '"••mg a, the next deS 

SEcS? ;r P-form 

fonnance o n ,?i, Proude a test per 

■'n ■ndiMduaUh,?^"'’^'' ’“sting and 

'•nd other iimnn ^ ’^”8 tests nmri ^’^^ractenstics 

houcAcr minim, characi ^“*‘8oe inattention 

•■■■’■"• ‘-Sir, 

^ temporary general or 



Sources of Test Variability 265 



j ,.riiial other than those that the 
?he Si's characlenstics will^b ^ vanables in the test 

:;rbW of the J-s These 

Itself, both of tvh nh ,J„„ms,raJ,on i^ect ^t Pe 

^“’’*"Xb l ty by having the individual 

hence test rehab^ y general 

takers T^P^u of the conditions 

?"e'°" ZtsZ or noSe will also affect test perfoimance 

example, and CO ^ ,g5, „ ,,„„™ of variability and thus 

Sr'SSSs..'' 



266 TcstRellabUity 


variability can be used as a basis for 
vanaMuv .rr reducing that portion of the 

Ihrtmnor “ ^ measurement (primarily 

the temporary ones) and thus increasing reliability 

strategies for building reliability into a test 

and above thrs'iMfe*'fectorr°” performance over 

can construct rehible tests hv c '"“=r“ted in measunng you 
to the greatest extent nnssibl “"trolling the extraneous factors 
this purpose are discuLed feRw'’""' ‘‘“““"""="‘>“‘1 procedures for 

T«n°et U 'Se? a' b"n7h “ ^>'or,er one because 

■lent were to make a careless Per objective If a stu 

“burred m an "nfamrhar ronte« J “r tf one item 

OR 'hat thrstuieoTw® to miss It 

“bjecuves Th.s conclus.on vvouid “ o'* “■ °f 'b" 

" would redect the test’s failure r s‘"‘Jc"t’s failure 

euesscd correctly on an Item o,^ Contrariwise if a student had 

prof'c,:‘ror.fc;™wyco"S itht 


torrectiy on an Item oV , '“““'vise Hast 

paper would be ,„accutSdv ® ?T‘* °r her „™„oo. 

proficiency on that objective '*■= student attainei 

- per objec,.ve-or tnpfed 2’’= "f" doubled to 20 item 

wol d would be reduced ? “^“0' °f 'bes 

the lest vv Tj ‘betr effect on i?*^ ™d guessmi 

.. "ould be lessened Ir conclusions drawn fron 

Tarictlngiic^, ,, 

niust be x\Tit((vn fn 

T?'bi! - pepresenn 

' “U'l “""me apprS b= made to cor 

"h"tes,otes,appropr?,;“ b presented in Chap.e, 
but to test reliabihtj 



Strategies for BulUtag Reliability into a Test 267 


as uell It IS easier to write 10 pairs of items— two per “bjeettve 
than It IS to write 20 items for a single broad course objective Sub 
dmding helps the test writer to write targeted hence specific 
^ infT thi* likelihood that items within a cluster are 
Items, increasing ince reliability is based on item 

measuring the same larceting helps insure reliabilily 

correspondence or ‘ ^ ’cffic skills such as achieve 

parlicularly m ihc case of tests ot specii 

ment tests 


Box 101 


W I .ohifl Player awards were started in 1929 
(1) The baseball True False 

j » .^lau^9r with an annual award 

, , rf states produM taroi revenues in excess cf 

(4, Five stales in the United States P P,,,, 

av Anderson (1969) systematically 
The above lour Hems /easier to read-that is are written 

aHina level (items 1 gn^j ,n interest level (items 1 

vary in ^ , ,f,an items 2 ,he Anderson Study- 

„ a lower 7/ " J ,„„„g_al Hems ot varying read 

and 2 were more in “""^ /Reading level represents a last 

than Items 3 and ^,.f,out realizing level a lasting anc 


;;“'a„dinle,esl,e^-;'^ 

mg and ,ic Keepm9 i.kelihood that all Hems are 

specie oy i"-^-;7:7o::rill, leva, c. reading abllHy 

increases ' = " ,h,„g M^'^^the interests or biases It involves will 

measuring the sa of fte ^ 3^3, sga 

Sirono Idiosyncratic" 



268 TestRellabillt) 


Controlling for 
General or 
Specific Skills 


Setting Dllficulty 
across 
Items at 
Intermediate 

Levels 


Cowrolllnsihc 

ConilUloniot 

^''ralnliiniion 


le!t'*rpW effect of reading level on an achievement 

of temfrj.l? ^ ‘he vanabihty in the readmg level 

skills surli ^ relates to its reliability On tests of specific 
If he eliLT reliability is likely to be higher 

constant Thus are tept 

reading skills von sb *'bt ^ measure of something other than 
m“teTv the ssm™ r f"TP‘ “> ■“=”= at approx, 

across Items based on 1"^ to avoid introducing vanability 
In anr.esrof sneers of the test Uself 

be kept relatively constamL ^ g™'™* shill requirements must 
ftion of these skills hut items Reading is the most com 

■n relation .ra p‘rs„„?rr/'’''' he consistent 

the qualities measured hv' ."if self-concept unless these are 
applied to specific charactens.* Constancy should also be 
competencies but this is oarii mterests or particular 

Items must have content u is iT^ ^ because all test 

h'hl) m Item performance thai'^f ‘o control the extraneous vana 

10 the quality being measured wilhntrodS' 

tlie domain of'^'^bjecuveslem'' “‘'='!“ately represent 

idiculiy le\els vary greatly from ^ Moreover where 

test reliabihiie^s ar ^ According to Lords 
1 mto the 50-750. range ( ha, 7 difficulties 

loot! rehab, bV “f ■""‘""adiateTfficu'l.r.o gue J’he mst 
''herever possible test s l 

- " 'o\bm^nra"f “"‘ro- 

tacludc mslnictions”* V '"“rc than just 1),*'°'' 'hey can It is 
hv conducted 'he eondilions und'^ "’‘‘‘len items Tests 

T'rnng eond,.,„ s ^ -a ‘o 

1“ crg^linB anj 



270 Test Reliability 


of their tests These procedures for increasing relia 
‘“""S outlined m the test cn 

tenon checklist as follows 
Is My Test RELIABLE^ 

(I) Are There Paired or Parallel Items That Agree 

^ objective) 

“d those who get one 

t^ng get the other wrong’ 
b Have nonparallel items been rewritten’ 

a" Is™ch‘«err“ "'■* Test Performance 

wellonthetotauS’™'^ students who do 

b “tive. neons, stent Items been removed’ 

^ a! "7a Uodorstandable 

^•»g 'temd:nt;r^^= - = basis for evalu 

(4) 

Unbiased’""® Proved to Be Systematic and 

b Are S'Snng'’!;nten7a®nYn '^'’7°"*’®’“' ''''"■‘s’ 

suitable as they can be'^ Procedures as detailed and as 

raewlir =*" '""'™"'n''*'ml«1,nnE'd''' °t ‘accuracy 
able tloahty accuralelv amt ^ t* t^°es not 

Ine lo ’'™(eiii/y J, „ || r ' b * ttieasures, but does it 

!o dll 'ZT, «“ “P -nd dZ tf ■' 

■h’t IS f7m 7 K a 1^15 77°”'"' to moment day 

‘Jctcrmininc mA ‘otprovinp being con 

P^mionam ' " '"'Paving the oon:i;.7c;:/“7L7ffom 


^PWtncnt Exam 

-'"te 

™Mcc on each of the n students dis 
‘be two parallel items r 



Using Test Results to Improve 


I Test s Reliability 271 


pair If the items in each the items in any 

majority of where students got one right 

pair right or wrong Those inst 
and one wrong in a pair indicat 

sistent they are in fact nonpara e l lacking agree 

Therefore as a two reliability Item pairs in 

ment or consistency would students get one right and one 

which up to a third or „e„s should be rewritten as 

wrong should be reexamined ese pa,rs should be 

a pair or at least one member ot suci 
rewritten ® 


J on analysis of the relationship 
After a test has been administer o„<,tysis ye^ 

between Item scores and total est^yco^^^_^^ ^ °s 

often reveals items tha scores shown in Figure 

parts of It The pattern ^adequacy of a 

Led as a basis *0 o*- '‘'LtaLhen rtiLro. 

that Item was found not would th , 

uring the ^“LLems hereby ■nc'reaymg^ 

revise the bad item recommended that results of a test 

the total test It is ^ Sf F « n'aLLuL of item 

teachers or commerciaUest examination^ 

by means of an ite , j^fy the bad by a group of 

students to ind analysis >s performance on an 

The P“rPf = ^ do tlem “ nee on the total test by 

with the total ^ , -nal and the per L,p across a number of 
.tem by e xamine this ^,oden. 

sis:,', -!■ 5'..*"" *"■' 

initially try a mat a test wdl be 


Item Analysis 



TestRehabtllly 


themos'trehaWefesr'' 

lakers into hg^srarersanifln"*^ simply by separating test 
companng the nerforma ^ scorers on the total test and then 

then select thoL trZ. “fa®?“P ^ou can 

groups that ts items thai ”r°st differentiate between the two 

most'of the l";”™;*!!" “f g<=‘ -<< 

be expected to yield the ere These items, when revised, can 
can ako sen, agreement 

into high and low scorers"and' students who have taken the test 
Ilem difficulty using ,he follow, ng°fo™u| “s'"" ^■*™™“l>'’il,ty and 

flamDiscriminability.Jl^jrfJnghscorH^ Nn oflowscorers 
"rho go. It right ^ got It right 
Item DlfBculty ~ of hiaR €•/»/»- — 


h -No of low scorers 
who got It right 


dems°r UemTaTd bad' ■^gki, 

00^,, m '“''•"g g' Ihe ansrtrl s^' ?=* 

louU?om 

"Uniber of tim ^ at resnnn computing tl 

^haice silua " “c'' ‘‘■^‘««or or w oounf tf 

'icr of students 'Indents and ? “ a multipl 

'bai is choscn*m^^° r''"’" 'bo right anf '"Pto d to the nun 
option that noon"" b^''^'"baa the right a ^ '‘"gle distractc 

^■foren, warthan '"oy be " d-stracor o 

on itemThaT's'"'' '"""dod ■ t^be Or " 

'"nt ,s j '“ bn Or you may have mu 

sgoodo 



Using Test Results to Improve a Test s Reliability 273 

betler an answer than the one ke>ed conect 
.ten. that has to he changed to -P™- 

d.stractors maj 'f'‘*‘“//,"3,soworthwh.Ie wexam.ne 

occur, cons.stency would he poor It , ^ on on .ten. in 

.tern d.fTtculty based on o.crall studen P atfficult 

order to detect .terns that may be too o,an..n.ng student 

In add.t.on to ^ h’efpful ,o d.iuss the .terns w.th 

response patterns on .terns, tt ,^ 55,00 of the .terns with 

students after they take the t niay help po.nt out 

colleagues .s also f'='Pf“'' f „jo„,a„d.ng and suggest ways that 
sources of amb.gu.ty and m „,e ,i,e.r shortcom.ngs 

problem .terns can be rev.sed to overcom 

u .nas are the measunng rnstruments 
Often m educat.on hu""*" Irue-false or matchrng tests 

Other than .n "'“'“P''! "ouvl and autoraat.cally, .t .s human 
wh.ch can be scored obj .-curacy of answers 
judgment that determines “^^acher reads the responses 
^ On an essay test, f°f,='‘"'"f'tnts- competency or proficiency 

and makes judgments ->=0“' read -hat essay a second 

What would happen .f ^ of performanw be made If 

time’ Would the same j.Uerent from the first, who is to 

siiiisiiii 

the onhe mfltVced^by expecta 

sponse and hec ^^^^onses ,„oe and apparent ability 

the scoring o^ ^ ^ past pe^ ons affect your scoring 

X?y- unconsmous_^-s or exp^^ to this 

rmnrse“f«5 2 ;;-,P;®liw..y of yo^ 

■"'•rou can -ore every -ayj"“st.ons ,„.ce and 
rescoringlty 


Reliability of 

Scoring 

Procedures 



274 Test ReUabUlly 


score all the essays and rescore 20% of them m the time it might 

ha\e taken jou to score them once without explicit criteria 

In performance testing or behavior measurement you are deal 
ing With questions of judgment and should use reliability observers 
when possible You need not necessarily use a second observer for 
every test or observation (that is, you need not collect two full 
sets of data), but you should include a second observer for one out 
of every file observations or make it yourself a second time If you 
bring in another person to serve as a reliability observer, be sure 
that both of you are there at the same time so the two of you will 
be observing the same behavior, but be sure to make your judg 
ments independently The comparison between the two sets of 
judgments can then serve as an indication of the reliability of 
these judgments You may also want to practice first with the 
other observer to increase the likelihood that you can get reason 
ably good reliability with respect to that person 

Here are some suggestions about how to improve your relia 
bility as a scorer First, cover the students’ names before you score 
so that you cannot be influenced by your expectation of them This 
IS called scoring blind 

Second structure your response key as much as you can in 
terms of what answer you are looking for, how many points you 
will give for organization content, creativity, problem solution, 
and rationale The more scoring specifications you can generate 
and write down (and hopefully communicate to students so they 
know what the entena are) the more likely you will be able to 
make these judgments consistently, time after time student after 
student * Refer to pages 124-35 for a more thorough discussion of 
these points (Also see Appendix B on test item specification ) 


compcution^M 3^4ging of dwm 

-Jpmrn.s (that h thTh SiSrald 

Incrtav: icormg rcliabiliu discarded each time t 

‘■ip averaf 



27S 


Additional Information Sources 

Dick. W & Hagcrty, N Topics in measurement ReUabtUty and xalidity 
N Y McGraxs Hill, 1971 

Doppelt, J E Hou- accurate is a test scored Test Senice Bulletin 
^50 NY The Psj ecological Corporation 1956 

Stanley, J C Reliability In R L. Thorndike fEd ) Educational meas 
urement, 2nd cd , Wash DC American Cduncil on Education, 
1971, Chap 13 

Thorndike, R L Reliability In Proceedings of J962 Iiniiatwnal Con 
fercncc on Testing Problems Princeton, NJ Educational Testing 
SerMce, 1964 



216 Test Reliability 


Seli-leslol Proficiency 


(1) We use the term reltahlUty to designate 

a the fit between a test s objectives and its items 
b the extent to which test scores predict future learning success 
c the degree to which the test measures the same thing time after 
lime 

d the meaning o 1 the scores Oft a test 

e the absence of cultural bias 

(2) The reliability of a test can be used to express the extent to which 
It gives a consistent measurement across items 

TRUE FALSE 

(3) The standard error of measurement on a test Is 

a a measure that increases as the accuracy of a test Increases 
b the ditlerence between predicted scores and obtained scores 
c a measure of true variance of test scores 
d the difference between true scores and obtained scores 

(4) The reliability coefficient is that portion of the variance in test 
scores that is the result o( errors of measurement 

TRUE FALSE 

(5) Match the type of reliability at left with Its definition at right 

a Kuder'Richardson 2t i consistency of performance across 
b Parallel item items that are Intended to measure 

e Spill half the same objective 

d Alternate forms H consistency of performance across 

e Test-reiesl different tests that are intended to 

measure the same objectives 
III consistency of performance across 
different administrations of the same 
test 

Iv consistency of performance across 
the odd-ltem and even-item seg- 
ments ot a test 

V approximation of the correlations 
among all the Items on a test 
vl approximation of ail the correlations 


ret Frtr .u 1 .. among nan the Hems on a test 

locmiry the type el reliability depicted 

° llvided rntd two 

'^dch student 

ths, odieoL 



chapter eleven/lnterpretability 
and Usability 
of Test Results 


OBJECTIVES I Define the test mlerpretabihty concept of a 
norm referencing, b norms, and c norming 
group 

2 Identify and contrast four kinds of normative 
scores namely a standard score b stanine 
score c percentile rank, and d grade equivalent 
score 

3 Jdeniily charsctenstics of jionn referenced tests, 
namely a item revisions, b standard instruc 
tions, c norms and interpretation based on them, 
and identify iheir strengths and shortcomings 

4 Identify charactenslics of criterion referenced 
tests, namely a based on objectives, b designed 
to be appropnate. c measuring performance, d 
using predetermined cutoffs 

5 Describe the use of four criteria for determining 
the mterprciabihty of a lest, namely a relation 
of scores to performance, b definition of accept 



able performance, c diagnostic and evaluative 
value, d useful relative information 

6 Distinguish between and apply four criteria for 
determining the usability of a test, namely a 
tedium, b practicality, c administrative pro, 
cedures, and d readability 


WHAT IS INTERPRETABILITY? 


Interpretabihty has to do with what the f ™ ^ 

thatTs, what they tell us about the test " {.d 

characteristics being measured While measure, 

ity tell us whether the test understanding infer- 

interpretabihty provides us with a should a lest measure 

mation conveyed by the test score * the results m a 

what we want it to measure, but it snoui p 
form we can understand and use question of in 

The result of a test is called a d\vhat the raw 


The result of a test is called a jersland what the raw 

terpretabihty is how can we nood’ Is it adequate’ The 

score means’ Is it h'sh’ fstased on inierprctaiion 

raw score is only a number, its mean ® ,tem chemist r> 

/rate in items rignt , ,.ifRr«tr»ni 


raw score is only a numoer, ^ .... . 

For example, a student . .w performance be sulHcieni 

test What does this tell us’ ^<’“''1 P "dvanced course’ To 
to pass the course’ To go on w,th a basis for inter 

become a chemist’ Unless a test P*^ l[j g ponit of reference 

preting-that is, unless a test provides us 

— it IS not a useful test ooints **in 

There are two types of student’s lest score u 

to interpreting a test The first is (-all this " 

the scoL of other students £ in 

referencing The second is to es calHh*® ^'^‘^^c^Vwill jjg 
relate the student’s test score to rt “ ^^^d.ty Each will 
mg These are different bases for m 
described 




ncinrnF'itCI.'vC 


Teachers are often called upon “ jf '"/“"'“''“mSc 


are norm referenced— that „/ norm 

the relative performance of P order to 
(We call this information norms i 



m 


Imerpretabllity and Usability of Test Results 


occurrence of 

m^Dst psychological and physical traits m the population It is 

ward Ser „d to 

people score islnbution In other words, many more 

designa'eTas 0 stSidardTe ‘‘f ">’““<>? ■" H 1 has been 
drawn to indicate distances '^^rtical lines have been 

deviation units A standard de^"^ * ' “nter in terms of standard 

Recall that tie sSrd ^ 

average of all differences between j" approximately the 

The greater the differencerbetween mean * 

standard deviation greater will be the 

one 5tand*d deviation "of ThJ'm”" ‘'7 

acores falling rvithinferstandari^ P'^oontage of 

IS 99 9». or virtually all the Icol: ,f »’= C=^3n) 

f on a test wtth the dtsmbuimn of a score of 

^ standard deviation of 5 having a mean of 40 

•andard deviation uni. aWe oxactly one 

^o'd fall at -le TsctmoTmrio'”' ^ °f 35 
standjd deviations below themtmf he two 

nnits in 

'-d 

Tscore Th“e Weehd''^" of ".o "e Ids “ 

standard dcvimo f use a Les . ^ 'tolled a 

mean set at 500 snH°^ vvhile the College Jin nnd 

d'ustrated in Figure'1tt^u'‘™“>'onoI loo^Tfe 
deviation based L X 'V *^''"8 'he preset „ “*■“ "re all 

group a guen set of^° ‘‘‘"rrhulion of test sr standard 

-O'os a' samite ofTh" oan^^ f^^he normiug 

‘scores IS shown in of raw to standard 

"•n Appendix c scores to standard 

* The standard deviar.n 

''‘"»on,s calculated by the fnii 

'"“"--^-owngerof^s 



Norm referencing 


283 


Again, the reason for comerting raw scores to standard scores 
IS to represent the scores on a relative basis within the test group 
Itself Neither the order of the scores nor the distribution of the 
scores is changed by this procedure, but the scores themselves are 


The Normal Curve, Percentile Scores, and Types ol Standard Figure 11.1 


Scores * 




Norm referencing 285 



If only ha could thmk m abslracl terms 


leave the distribution of raw scores unchanged (only the score 
values themselves are changed), the conversion to stanine scores 
changes the original raw score distribution to an approximately 

"°The"st‘anme“core has the great advantage of being easier to 
j easier to interpret (the latter being true because it 

fs a™nX d^'t -e tfen other\ypes of standard scores How 
hSausf each score represents a band on the continuum 
r:S;r\h:n : poTnt Stanine Lores are less precisely stated than 
Other standard scores 


A rank describes the relative standing of a raw score in 

A percentile rank descr 

“ and what percent scored higher Again look at Figure 

scored lo lohpled nercentile equivalents which repre 

L ‘t p" PoLts on t dLsLibution “xhe person who gets the 


Percentile Rank 




Flguro 11.3 Student Profilo Report on the Stanford Achievement Test " 

STUDENT PROF/LE REPORT 


Norm referencing 287 


middle or median (Md)^ score has done as well as or better than 
50% of the test takers and hence js at the 50th percentile Thus, 
the percentile rank is not based on the absolute size of a score but 
on Its relative standing Because the size of the interval between 
percentiles is not standard, that is, not uniform, the percentile 
rank is not considered a standard score (An example of the calcu 
lation of percentiles appears in Appendix C ) 

A student profile report on the Stanford Achievement Test 
appears in Figure 113 Note that both percentile ranks (PR) and 
stanines (S) are provided for scoies based on both national and 
local norming groups, although only the former appear m this par 
ticular case When compared to a national sample of fourth 
graders who took this test at the time of its standardization (the 
norming group), the student whose achievement is reported in 
Figure 113 scored particularly high in the language arts area and 
at about an average level in the other areas This is shown graphi 
cally in the charting of stanines 

Since norms are provided for an entire grade level, and 
achievement occurs continuously throughout the grade level, 
normed scores will be partly dependent on when the test is taken 
The student reported on in Figure 113 took the Stanford Achieve- 
ment Test part way through the grade (specifically, in December) 
Further experiences in the grade may not produce any measurable 
effect on the particular form of language arts subtests used when 
scores are already so high relative to the norm group Such is a 
limitation of percentile ranks where you are typically limited to a 
single grade level within which to make comparisons even though 
(1) test taking may occur at different times throughout the grade 
for different school districts, and (2) some students are perform 
mg at levels closer to students in other grade levels than m their 
own * 

Figure 114 shows a table of norms for a portion of a specific 
grade level as taken from the Stanford Achievement Test With 
this table it is possible to convert an individual raw score into a 
percentile rank thereby providing test score interpretabihty with 
out recourse to standard score calculations Tests such as these 
provide test takers with both their standard score and their per- 
centile rank, each of which relates their test score to the scores 


5 The median score is the middle score in the ^uential ranking of scores, 
the mean score is the mathematical average of ^res They are seldom the 

same exceot in the perfect or Ideal normal distribution 

c TW deficiencies may be somewhat overcome by using grade equivalents 
m addition to percentile ranks 



11,4 A Sample Norms Table Taken from the Stanford Achievement 
Tests * 

Stanmes and Seieciad PeTcentile RjTiteCMre'poitfSjngtoRjw Scotes by Test End ot Grade 2 







Norm referencing 


289 


of a norming group The percentile rank gives the test score a high 
degree of mtcrpretability on a relative basis since without it, it is 
haid to determine the quality of a students performance 


Norm referenced scores on achievement tests can be expressed in 
a form other than percentile ranks or stanine scores, which as we 
have seen, are based on the comparison of present scores to thos 
already obtained on the test by a norm group of the same age an 
grade level An alternative approach is to 

tcore to scores across a number of grade iden ify he 

grade level at which the given score ,s most a-'j-'ar ‘hat of the 
forming group average Such a procedure results in grade egmva 

'"“to oblain a score that represents an average or typical^per 
formance take the standard scores Any 

groups at each grade level and “"‘P“ ( 5 , is aver 

student getting one of assigned 

age tor the norming group at t g 

the grade equivalent ‘ beginning fourth grade norming 

sponded to the average of t be 4 I, ,f it corresponded 

group, the grade equivalent s e ^o^ norming group, the 

to the average for the beg n g^h ^^g ^ 

grade-equivalent score ,bese norming group 

hood IS that actual aeores ,bis, the score 

averages rather t^han dire ^ score for each grade 

range between the awrag succeeding one has been arbi 

level norming group and based on a 10 month school 

trarily divided into ten equal parts o _ ^ „p „n,fonn 


year and the 


:d into false assumption of uniform 

, accessary hu‘ P J ^ halfway between 

growth over time ^hus. ' norming group and the fifth 

the average for the fourtn ^ equivalent of that score would be 
grade norming group th g average per 

4 6 Ifthescoreis atthe W (Most achieve 

formance, the grade oq 6 or 9 within grade levels ) 
ment tests are nomed a ' ^ ^ent scores for the Word 

Figure 115 Adlievement Test (1970 edition) 

Knowledge subtest ^ ponding standard score Grade-equiva 
each in terms of its publisher based on administrations 

lent scores are provided by school month 

of the test given , fpr each elementary grade level these 

(usually in face type m the figure (All grade equivalent 

scores appear m bold ta 


Grade equivalent 
Score 



11^ Slanrfard Scores and Corresponding Grade-equivalents on the 
Word Knowledge Subtest ol the Metropolitan Achievement Test.- 


Grads Equivalent 

2.0 

22 

23 

2.4 

25 

26 

27 

28 

29 

30 

3.1 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

5.1 

52 

53 

54 

55 


Standard Score 

45 

46 

47 

49 

50 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

65 

66 
66 

67 

68 
68 

69 

70 
70 
7t 
72 

72 

73 

73 

74 

75 


Grade Equivalent 

56 

57 

58 

59 

60 
6.1 
62 

63 

64 

65 

66 

67 

68 

69 

70 

7.1 

72 

73 

74 

75 

76 

77 

78 

79 

80 
81 
82 

83 

84 

85 

86 

87 

88 

89 

90 

9.1 


Standard Score 
77 

77 

78 

78 

79 

80 
80 
81 
81 
82 
82 
83 

83 

84 

84 

85 

86 
87 

87 

88 
88 
89 

89 

90 

90 

91 
91 

91 

92 

92 

93 

93 

94 
94 

94 

95 





Norm referencing 


scores are based on this single administration ) If the middle 
standard score among students at grade level 2 1 is 47 on t ® 
Knowledge subtest, as Figure 1 1 5 shows it to be, then any students 
who obtain a standard score of 47 can be assigned a grade equiva 
lent score of 2 1 They are performing at the same level as t 
middle child at this grade level They may, 
graders or Hrst graders, but their score of 47 j 

middle scorer in the 2 1 grade level Thus, they would be assigned 
a grade equivalent score of 2 1 meaning that 
the test at the middle level for beginning of 

Since students are not tested and 
school, but are typically tested once (or a ’ available 
emptncal data for use in assigning to get the 

for only one or two mne Inths^in the 

grade equivalent scores for the ot conversion tables as 

school year, interpolation is used to P P . interpolated grade 
shown m Figure 11 5 Keep in ™nd that *es^nmtp 

equivalency figures are determ y process that auto- 

tween empincally determined scores by W a P 
matically assumes that equal learning occurs ea 
ous assumption indeed u ^ the Metro 

When students take a" achtevemen e t^euch^a^^ 

pohtan Achievement Test, their answers) are 

scores they °htaia— 4S“ally “ ^ af ,]|c norm 

transformed into standard sc transformed ’ into 

ing group The standard mterpretability If a child 

grade equivalent scores fo p P obtains a grade equivalent 

has just begun the , the child is sconng on this one tesl 

score of 3 5, this indicates that *e cm i ^be third grade 

at the average level of student halt^vay .hr gfade mates (begin 
rather than at the average level of his 

ning fourth graders) comparable across subtests of a 

Grade equivalents are n average for their 

battery for students who are position within his or her 

grade A student whose Peueeutde ^ equivalent scores 

grade is the same in all subtest y months of score 

m the different subtests ,2.%rcentile ranks) are not of 

Furthermore, grade u-iuivalents ^ ^ pc,„l of raw 

equal size across the scale At f = “^crence m grade equivalents, 
score may make several mon*s of di pP,„, 

whereas at the middle, an ‘"ur^s equivalents 

make only one usonth of ddferen obtains 

cannot be interpreted literally 



292 Inlerpretablllty and Usability of Test Results 


Tests 


a math computation score of 4 8, that does not mean he or she can 
be moved immediately into the fourth grade 

tations“p T’’ has its advantages and hmi- 

standardired msTpro^S^S' r’d.l^r^m ki“nd 1 

“cTol'’drr‘em ^itesr ^ 

inte^mting the same raw scor in^rus m™ 

are also calleVpubhshed°tesTs) (which 

■s based on norm referencing are know?h ““=‘'P''‘='^hility 
tenstics ®' *^nown by three general charac 

'c/inerln mh?r"worl! tiTuerafon a'"" 

necessarily the originals They have beL?”a 
the tryout analyzed Based An tu ^t^ed out and the results 
been deleted or reLed soThat the I^ave 

y effective (m separaitne hich « * remain are reason 

°‘herways) Similarly. Jco^gW or m 

revised to eliminate ambiguity^ ^ ^ answer choices have been 

les'.s sta„dard,zed test 

■be same ma™^r^ .v “^ministered, ,t vSrbTfd*' 

^‘'“";->.esam':td:::r“ "■= .h“;tr o/r 

" '■■ Ihe norms^ihat f tet with its Im“ “ (<■■>■ 

^■"^-■■"8 It wiA a Tar P-=rf°nLa" Vrr=‘T““^' f- 
■'■= -™f„Vgrup“ '’"“-■nSteT peTL™! 



Cntenon referencing 293 


The basic value of norms is to indicate how high or low a student s In Summary 
score IS, independent of the difficulty of a test, by comparing it to 
scores of others on the same test over a period of years Standard 
scores, stanine scores, percentile ranks, and grade-eqm^ alent 
scores are normative or relative versions of raw scores They te 
about performances and charactenslics of test takers relative to 
the performances and characteristics of other test takers in a refer 
ence testing group If a test was hard— it was hard for all, if easy 
—easy for all Norm referenced scores thus help us to (1) interpret 
individual scores by comparison to group data, and (2) “ake con 
elusions based on test scores, therefore, somewhat independent 
of the failings or weaknesses of the test This latter point is worth 
amplifying We may not be sure whether a particular test is too 
Tsy ^too hard to compensate for the test’s possible inexactness 
we use i^not as kn absolute measure of to' the mLr 

determining the relative capabilities ® ^ m mind First 

Two shortcomings of norms must be kept in mind hirst, 

neoDirculturrand societies change over time and norms can 
people, cultures, anu s the tvoes of performances of 

become dated 1 ,he pattL of their experiences 

which people are capable basea a sermus 

As educational practices Second, an emphasis on 

problem and lead to ,3 ,e„ds to obscure the relation 

the relative tests and any bearing on the past or 

between the jakers If the lest results are to have any 

future reality of the test k determining the 

bearing on f ‘ („dmt norms should not be seen as a 

educational future of a studenc, 

substitute for „hen we are interested in the abso- 

Moreover, there are 
lute value of a score rath t ^ 

we now turn to criterion referenced 


CRITERION REFERENCING 


. on test inlerprelabilily, only norm refer 

Thus far m this , 5 ^ only the interpretation of test 

encmg has been consi ere or published tests are 

scores on a relative although, as we shall see m Chapter 13, 
usually norm tests are beginning to appear Most 


usually norm refere ^ are beginning to appear Most 

published criterion r referenced because they are typically 

published tests are ^ ^(jier than their user and are constructed 


designed by someone . 



294 


Interpretability and Usability of Test Residts 


m such a way as to have their primary meaning in terms of how 
different groups perform on them It is possible though to inter 
rwhVrer'ai” of ''°w many items students get 

ance In compares to group perfo™ 

cntoion and t^.“ t'd performance 

deteOTmed imh f ? u®'' =“">0 pre 

referencmg “ ™ ” approach criterion 

eaUed cnt'SfoTreferlnM?'^ automatically 

enced SteZrefaen^eH norm refer 

or related to behavioral refel^T^rrf' Performance is linked 

has been designed an"^^^^^^^^^ 

In other words the test hvde ““ such a basis (Jackson 1970) 
a student s ability to Lrm onfl '"fonnatiDn about 

terms If the test is one S add ‘“a Performances in absolute 
must have some basis for sayine iLi"”* ^P^'^eting fractions we 
and subtract fractions if he or she In 

- *e 

f ''eesas { 1966!™^™*“™*’™"'''“^'' “ ^pfemnced’ 

"8 be that training has rSiltld ■■ f°f eeferenc 

test IS given to a group both he? “erease in proficiency If a 

^ Unfer.unaielyfevvof.u,, , “>■ P '«t is criterion 

rrler---aVrh- 




duv ot a I^v, "as „eum„e„Uea o„ p,,, ,,3 



Criterion referencing 


(1) Prepare a content outline listing the skills and knowledge that 
the test 15 an attempt to measure (this is the content 
prepared as a basis for appropriateness— see pages 67 7U) 

(2) Identify the performances (le, measurable objectives) o 
which the test taker should be capable assuming that he or she 
has acquired proficiency in the skills and knowledge meas 

(3) Idenufy'the'tomain that each objective defines, write items 

the process since the valida i 8 

rted'’X-pSmr::::ewLh^r.h=yposses^ 

skills and knowledge on **'* score showing 

(5) Decide upon or „us, obtain to indicate suffi 

form the criterion behaviors 

. criterion referenced tests, therefore 

The important features of criterion 

^ ,ot of behavioral or perfor, nance 

(1) that they are base j,jn,p, m measure 

objectives which they ^ degree of appropriate 

(2) that they are „„ objectives 

ness by virtue of being behavior or perform 

(3) that they represent s P 

ance ^ „ can be interpreted m terms of 

(4) that performance on them can 

betweVnnXX“n|dandmi-^^^^^^^^^ 

be thought of as ““P**-® “oth ways provided that objectives or 
the same test m either °r ^,,3 for vvritmg items 

discrete content catego entenon referenced and nom 

The essential differ™"^ {„n„er is based on predeter 

referenced interpretation 



296 


Interpretablllty and Usability of Test Results 

mined cutoff scores (presumably mtnnsic to the required perform 
ance itself) while the latter is based on the performance of a norm 
group (an extnnsic basis for interpretation) 

(that IS representing its do^m measure each objective 

formancc levels * (41 admtnicf ^ ^ ^ Pt’esettmg acceptable per 

=>'ng them p=rfo™an« . ' L™? "’/.f ‘ “> -nd evLu 

whose perfomiance requirements thev “f objectives 

Cmenon referencing" *£ ““ 

nbjectucs and items to measurriho!. ‘“‘*“8 

be readily done by teache.?trd.l?r Wh'le this can 

objectives and items required for a d™, writing the 

enccd achievement test^is a mamr cntenon refer 

pliable for testing companies than for Perhaps more 

"en a testing company will find it ^ Personnel But 
efemneed .os, on a nat.oiial „ eve n^'‘ 
epproach would sacrifice the la™ ?" ‘^^Sional basis since such an 
penicular school system Tfe aefS”' f ‘'=^‘ needs of a 




Crifenon referencing 297 


Such cntenon referenced tests, as they appear vviJl have the 
advantage of allowing each school district to target its testing pro 
gram to Us own goals and to monitor goal attainment in an abso 
lute rather than a relative sense It will not suffice to say that 
Blair has learned more than Bret We will have to know the nunj 
ber of goals that each has met in order to certify advancement to 
new and more complex ones Considerable help will be afforded bj 
test publishers who offer schools test items matched to objectives 
so that each school can shape and form its own achievement tesi 
geared to the needs of its own students 







298 Interpretablllty and Usability of Test Results 


A TEST'S INTERPRETABIUTY 

By applying the concepts of interpretablllty described on the pre- 
vious pages a teacher can determine a test's interpretablllty To 
assist the teacher the checklist questions below are offered 

Is My Test INTERPRETABLE^ 

(1) Do I Know How the Scores Relate to Relevant Performance 
a Is my test referenced in terms of some criterion (eg, my 

objectives)’ 

b Can I tell what a high score and a low score mean’ Or, can 
I report the specific objectives on which proficiency has 
been demonstrated’ 

c Can the results for an individual student be used as a spe 
cific indication of level or degree of proficiency’ 

(2) Do I Know What Defines Acceptable Performance 

a Have I preesublished cutoff scores (e g passing grade) and 
if so on what basis’ 

4 Do I have some concrete and verifiable way to say whether 
a particular performance suffices in terms of obiective 
speaficatiOQS of acceptability’ 

(3) Does the Test Provide Diagnostic and Evaluative Infnrmation 
a Does it te me the areas in which a student needs help’ 

4 Docs It tel me the areas in which the class needs help’ 

provelm" ■" - 

(4) Dees It Provide Useful Relative Information 

a Does it provide the kind of data that I ran 

mghd^ with results of past and future ^ 1 ^/”' 

?thausrsued^ ^ d basts 

what\':crr:ra tt:rer'Sr"al;°u ifoi 

It IS the number of items right What does thlt* "f Usually 
v^ey’ Has the student demonstrated proficient of th^T" 
being measured’ Does the student have the , * objectives 

next level of performance’ Does he or she “ the 

perfotm skills at a level reqSned f^r be„t lo 

some occupation’ In itself, a test score telk ® ™Ployment in 
useful, test scores must be interpreted 



Determining a Test's Interpretabillty 


If a test IS based on a set of objecuves and if 

selves have some vahdity, then the seore ™ ‘ t „ea 

something about a .nlormMion about stu 

A cntenon referenced test P of „hich the test is a 

dents' degree °f P™*'‘'‘“T",ost jhLld be mdieat.ve of achieve 
measure A high score on the of achievement You 

merit of objectives and a low s 

may think, perhaps, that the ohievement If your test has 

and hence not a fair (o„oss_ validity, and reliabilily, 

already met the criteria of appr p • „„ achievement 

It IS hLly that the “ rXtf achievement 

and thus you can conclude that test s 

of objectives it may be helpful not to 

To adequately ‘"“=''P''®‘ “I’^-ore Since a criterion referenced 
restrict yourself to ‘“rilnfiesTeath meXmg a single, but 
test IS like a collection , aming a minimum of two, and 

related objective (and o„t can be described by reporl 

often many more, items), student has demonstrated 

mg the specific objectives appropriate mini test Thus 

proficiency on by virtue o p ^ j, ^ score must be to be con 
instead of having to ?,s®t for each student the name of 

sidered "high," you ®‘™P has demonstrated proficiency 

each objective on which h of the level of proficiency 

along with (if desired) ^3 p^, manly based, there 

attained Interpretabillty of such .„d, cations of degree of 

fore, not on the total ctive of which the test is a meas 

proficiency attained for our real interest and criterion 

ure Since the objectives ^ P p, of objectives we can focus 

referenced tests “ stations on the very learning we are 

our test results and interpre 
interested in enhancing 

,rv Item that measures a given objective 
Must a student pass ,hat objective’ That is, how at® w® 

to be judged as compete Because of variability 

to interpret pcrfntmanc of various types of unsystem 

on Item difficulty and ^PPpoSe to allow some room for erroi 
atic errors, it is n°t nureaSpr each objective what the acceptable 
Each teacher must performance can fits 

margin of error ^hn‘l “c h;^^p ,fp^ ^pdgo,cnt However, if items 
examined in °tdcr p^e replaced or revised during 

that seem to be i 


299 


Relation of 
Test Scores to 
Relevant 
Performance 


Definmg 

Acceptable 

Performance 



300 Interpretabllity and Usability of Test Results 


Diagnostic and 
Evaluative 
Infonnatlon 


Mtablishment of reliability, the remaining maigin or error can be 
made as smaU as 10 to 20 percent to allow for unsystematic errors 
The amount of error that can be tolerated is proportionate to 
the m^ber of items used to measure a particular objectree Where 

tenon could be used for evaluatmg total tesroerf 

related objeetives, hut not Performance ri^d*dTa“%^re? 

ment but teL yormouTrnstrnctmJ 

examine individual performance vnti t, well If you 

each student's degree of proficient on °'*k ‘*®‘=rmine 

fie^ncy is sufficient, progress * '’>>J“hve Where pro 

Where proficiency has no7 a, " '“a '“"’"S ^rWs 

•ton aimed directly at those ' mstruc- 

judgments of gr<f„Tpro"t‘“'''',‘f ' <>= 

shouM be provided before ,n™roSro'" remedial ins, 1^101^0 

Finally, the mterpretable ip^i Progress to new areas 

udrquaey of instnicl Ttofic, ‘"formation ol the 

E§SH£§S3k 

« a Sr,;?55l..S -b' -"."er™ S'b'eJllr "» 

J ‘-lives measured by 



Determining a Test’s Interpretability 301 


OBJECTIVE 1 


Number of students Number of students 

showing proficiency not showing proficiency 


experiences were insufficient for achievement of this objective 
Changes m lesson plans or learning materials for teaching this 
objective should be seriously considered for subsequent mstruc 
tion to augment or replace those currently in use 


Test results also represent a source of relative information about 
student performance That is, we can evaluate a student s perform 
ance by comparing it to the performance of other students In the 
largest sense, this is called norm referencing To use the concept of 
norm referencing in your own classroom, place the total scores on 
the test of all your students who are taking (or who have taken) 
the test in rank order going from the highest score to 
and then assign each score a rank starting with 1 for the highest 
You can then assign each score a percentile rank or conver i o a 
stanine score (using the procedures described earlier m t is c ap 
ter and in Appendix C), or simply separate scores he top 
fifth second fifth, middle fifth, fourth fifth and lowest fifth with 
approximately one fifth of the scores m each category 
You must now decide whether this kind of m orma lo 
ful in interpreting test scores m addition to ‘ ^ 

criterion referenced concept of proficiency on o jec ives 
to some intrinsic standard (On attitude tests example ^ 
information is considerably easier to int^pret t an ^ ^ „reas do 
mation because specific objectives or subtopica 
not have quite as much independence an meaning 
cognitive or performance areas ) Ze mean 

test scores does represent a way to provide em \ about the 
mg at least m relative terms Where you are 

properties of your test the normative , meaning of 

recommended Where you have more confi en , , 

your objectwes and .he " T n.o“ 

measures them, criterion referenced . j 

informative of the two approaches for eva 


Useful Relative 
Information 




201 


Inlerpretability and Usahiliiy Test Results 


determining a test s usability 

Us.ed below are checklist crilcna for checking on the usability of 
a lest 


Is My Test USABLE^ 

(1) Is It Short Enough to Avoid Bemg Tedious 

a Does « stop short of creating fatigue’ stress’ boredom 

b Have I tned to make it as short as possible wiihm the limits 
of reliability’ 


(2) Is It Practical for Classroom Use 

a Can It be used conveniently m a classroom’ 
h Uvtvrtthm the limit of available teacher time’ 
c Can It be used to test all students’ 

d Is It realistic about the kinds of equipment and physical 
set up It requires’ 


(3) Are There Standard Procedures for Administration 
0 . Are there clear wnlten instructions’ 
b Can It be administered by someone other than me’ 
c Can it be given m a nonlhreaiening nondiscnminatory way’ 


(4) Can Students Comprehend It and Relate to It 
a Is u UTittea at a level students can understand’ 
b Is It interesting clever or provocative’ 
c Is It written to engage students’ 


Our consideration would not be complete were we to overlook 
the criterion of usability Tests must be taken by people often 
children and to be at all meaningful they must attam a minimum 
level of practicality We must consider briefly at least the minimal 
criteria of usability 


Tedium Many tests suffer from the obvious failing of being too long Such 
tests provoke hostility or at least produce boredom and fatigue — 
often resulting in invalid lest responses There is no absolute rule 
as to how long a test should be Typically the more items there 
fore the more reliability however tedium can cause unreliability 
What IS this point of diminishing returns’ It can best be deter 
rnincd by initially building your lest to fill the amount of time 
allowed for its completion and then observing students m the 
process of taking it You can generally tell from their reactions 
(or from an item analysis) whether your test is too long If it pro- 



Determining a Test’s UsabiUty 303 


vokes restlessness, visible fatigue, and complaints, shorten it for 
subsequent use Occasionally, it may be necessary to divide a test 
into two tests and administer it in two sittings m order to cover 
all necessary test matenal Keep m mmd that at least two items 
per objective (and if possible more) must be included for reliabil 
ity purposes on a criterion referenced test 


Some tests are highly inconvenient for classroom u«e, for example, Practicality 

they require movement or the use of equipment or perhaps oral 

responses If these features are critical to the appropriateness of 

the test then you may choose to sacrifice some practicality for 

appropriateness (which is not at all unreasonable) However, be 

sure that you are gaming something m return for the demands in 

practicality Your goal m testing should be to try to maximize 

practicality and appropriateness simultaneously Often, if you 

think about it you can conduct a test in a more practical way than 

you might have originally thought Take home tests, for example 

might be one way to overcome certain forms of impracticality 

Also, testing rooms may be set up for individualized testing (or 

self testing) without requiring that all students take a test at the 

same time 


The written procedures and instructions for administering a test Administrative 

are part of the test They should be clear and nonthreatenmg Procedures 

Often, It IS helpful if you read them aloud Opportunities for ques 

tions should be provided However, if your instructions are clear 

and understandable, few questions should be asked Questions can 

serve as a source of information to tell you how your instructions 

m\ght be improved 

Instructions should cover such things as (1) what the test is 
about, (2) why it is being given, (3) what response format will be 
used and how it is to be used, (4) how much time will be allowed 
(5) whether questions will be answered, and (6) how it ivill be 
scored Proper instructions at the outset may save you having to 
give them over and over again on an individual basis 


As has been said before, a test is useless if students cannot read it Readabi Ity 
It must be written at their reading le^el and not yours It is also 
helpful from the point of usability if students find the test interest 
mg or even fun to take Occasionally, humor can be introduced 



3M InterpretabiUty and Usability of Test Results 


sometimes novelty of format helps keep mterest Readability does 
not only mean that students can read it, it also means that they 
imiit ID read it Tty to be sensitive to their interests and orienta- 
tions and to their language Just as you relate to students through 
instruction, you should also try to relate to them through tests To 
test you must communicate, without communication no testing 
occurs Test construction cnn be creative 


SELECTING A TEST: A SUMMARY 


tmns' 5"=“ 

ob/ecL^i O-* •>■= test measure my 

melmre? ”""• ■' “ 

the «rc™s"aSc« SpVese^^^^ ■" 

nMo7mm‘:;~ tn Figure 


■nsurett s ^C/,ec«,s„orCt„er,on-te,etencedTes,s 

Nn I _ 


I My Test APPROPRIATE? 

(') DoMltFitMyObieclives 

ttialliinoobiecbves? ™ty objective and 0 items 

(31 unite Ibe Conditier each aclionj 

Ibe Slatemen. o, „ve„s 



Selecting a Test 305 


(4) Does It Employ the Criteria; 

a. Is the scoring of each item for a given ob/ective based on the criteria 
stated in that objective? 

II. Is My Test VALID? 

(1) Does It Discriminate between Performance Levels. 

a. Do students who are independently judged to perform better in the test 
area perform better on the test? 

b. Do different students with different degrees of experience perform dif- 
ferentfyon the various items? 

(2) Does It Fit Any External Standard* 

a. Does success on the test predict subsequent success in areas tor 
which the test topic is claimed to be a prerequisite? 

b. Do students who receive appropriate teaching perform better on the 
test than untaught students (or does a student perform better on the 
test after teaching than before)? 

(3) How Do My Colleagues View the Coverage* 

a. Do my colleagues in the topic area or at the grade level agree that all 
necessary objectives and no unnecessary ones have been included'^ 

b. Do they agree that (he items are valid for measuring the objectives'^ 

(4) Does It Measure Something Other than Reading Level or Life Styles 

a. Are the demands it makes on reading skill within the capabilities of 
the students? 

b Is performance independent of group membership or any other socio- 
ethnic variable? 

(K Is My Test RELIABLE? 

(1) Are There Paired Items That Agree 

a Da students who get one item of a pan (per objective) right also get 
the other right and those who gel one wrong get the other wrong'^ 

b. Have nonparallel items been rewritten? 

(2) Is Item Performance Consistent with Test Performance 

a. is each item consistently passed by students who do well on the total 
test? 

b. Have inconsistent items been removed'^ 

(3) Are All Items Clear and Understandable’ 

a Have the student responses been used as a basis for evaluating item 
clarity? 

b. Have ambiguous items been removed or rewritten'^ 

(4) Have Scoring Procedures Proved to Be Systematic and Unbiased 

a. Have multiple scorings yielded consistent results'^ 

b Are scoring criteria and procedures as detailed and as suitable as they 
can be? 



306 Interprctability and Usability of Test Results 


YES NO 


IV 


Is My Test INTERPRETABLE? 

(1) Do I know How the Scores Relate to Relevant Performance 

a Is my test referenced in terms of some criterion (e g , my objectives)? 

b Can I tell what a high score and a low score mean? Or, can I report the 
specific objectives on which proficiency has been demonstrated? 

c Can the results for an individual student be used as a specific indica- 
tion of level or degree of proficiency? 

(2) Do I Know What Defines Acceptable Performance 

a Have I preestablished cutoff scores (eg passing grade) and if so, on 
what basis? 

b Do I have some concrete and verifiable way to say whether a particular 
performance suffices in terms of objective specifications of accept- 
ability? 


(3) Does the Test Provide Diagnostic and Evaluative Information 
a Does It tell me the areas in which a student needs help? 

b Does it tell me the areas in which the class needs help? 
c Does It tell me the areas in which instruction needs improvement? 

(4) Does It Provide Useful Relative Information 

a Does .1 provide the kind of data that 1 can compare meaningfully with 
results of past and future testings^ 

detired? '°*'*'** 'n'erpretad on a norm relerenoed basia it that la 


Is My Test USABLes 

(1) Is It Short Enough to Avoid Being Tedious 

a Does It Slop short ot creating latiguev stress'! boredom? 

lllty? '"‘•I'"' the limits Of ral 

(2) Is It Practical for Classroom Use 

a Can It bo used conveniently in a classroom? 
b Is It within the limit ol available teacher timov 
e Can it be used to lest all students? 

reglS"’" -d physical setup 

a' Arnmere cr"" Administration 

a Are there clear wrilten instructions? 

c cln ' It someone other than me? 

.^>cans.u:rs;r„rrBr,e"^ 

Is It written to engage studenU? 



Selecting a Test 307 


On a published test, answers to the criterion questions can 
usually be gleaned from the manual of the test coupled with an 
examination of the test itself In the last section of this book, deal 
mg with published tests, these questions tvill be applied to some 
published tests for purposes of illustration 

The five criteria have been termed appropriateness, validity, 
reliability, interpretability, and usabtlity If a test is not appropri 
ate for your objectives, it should not be selected, regardless of the 
adequacy of its other properties If it is appropriate, then it must 
also fit Its own label or title, and hence be valid, to be useful If 
both appropriate and valid, it must then be an accurate or reliable 
instrument It must also be one whose results are interpretable, 
since results that cannot be interpreted are meaningless Finally, 
It must be usable if it is to work at all under the prevailing class 
room conditions 

The same five criteria can be applied to teacher built tests as 
well Teachers must state their purposes or objectives and their 
tests must meet them in an appropriate and valid way Their tests 
must also possess accuracy as measuring instruments and their 
results must be mterpretable In addition, the cntenon of usability 
must be considered Usability refers to the practical charactens 
tics of a test, such as its cost to purchase or develop, the degree of 
sensitivity it is likely to arouse, the amount of time it takes to 
administer, and the ease in scoring the test and reporting the 
results 


The final section of this book deals with using published 
tests Throughout that discussion, the terms and concepts de- 
scribed in fAis section wW be applied to assist teachers tn the 
selection, use, and interpretation of these tests 



308 InterpretabilUy and Usability of Test Results 


Additional Information Sources 

Anghoff, W, H. Scales, norms, & equivalent scores. In R. L. Thorn- 
dike (Ed ) Educational measurement, 2nd ed , Wash. D C.*. Ameri- 
can Council on Education, 1971, Chap 15 

Gardener, E. F. Interpreting achievement profiles — uses and warn- 
ings NOME Measurement m Education: A Senes of Special 
Reports of the National Council on Measurement in Education, 1, 
(2). 1970 

Lennon, R T. Scores and norms In R. E Ebel (Ed.). Encyclopedia 
of educational research, 4th ed , N.Y, Macmillan, 1969. 

Lyman, H B Test scores and what they mean Englewood Cliffs, NJ : 
Prentice Hall, 1963 

Womer, F B Test norms Their use and interpretation. Wash. D C ■ 
National Association of Secondary School Principals, 1965. 



Self -test of Proficiency 309 


SelMest of Proficiency 


(1) Norm-referenced interpretalron retales a person's test score to 

on that test by other people. 

(2) Define. in one sentence each, a. notms, and b norming group, 

(3) For each of the illustrations below, identify the type of score 
depicted, that is, standard score, sfanine score, percent})e rank, or 
grade-equivalent score 

a Bobbie’s score on the test was higher than 82% of the students 
in the norming group at the same grade level 

b. Although Bobbie is in the fourth grade, her test score was as high 
as the average fifth grader on the fourth grade test 

c Bobbie’s test score was two standard deviation units above the 
norming group average score for her grade level 

d. Bobbie’s test score fell into a band of scores of the norming 
group that was represented by the score of 7. 

(4) Refer to the sample norms table m Figure 11 4 on page 268 to 
answer the questions listed below. 

• Jeff got 48 Items right on the Word Study Skills test 
a. What was Jeff's stanine score? 
b What was Jeff’s percentile rank? 

c Which of the following words would you use to describe Jeff's 
performance? Poor Average Exceptional 

(5) Check all of the statements below that are characteristic of stand- 
ardized tests. 

a. Items are purposely written to include ambiguity 
b formal instructions for administration are provided 

c. scores are best interpreted m terms of the number of correct 
responses 

d Items have been analyzed and refined 

e. norms are available for score interpretation 

(6) Name one strength or advantage and one shortcoming or disad- 
vantage of norm-referencing a test 

(7) a Criterion-referenced test interpretation is most applicable in the 

area of (creativity, skill-testing, problem- 

solving). 

b Criterion-referenced tests are interpreted on an 

(intrinsic, intuitive, extrinsic) basis 



310 Interpretabilltyand UsabilllyofTeslResulls 


(8) Check all ol the statements below that are ot crstefson' 

referenced tests 

a measure samples of actual performance 
b scores do not relate to absolute proficiency 
c based on objectives 

d interpreted in terms of predetermined cutoffs 
e reflect preformance relative to other students 

(9) You have just admmistered a math test to yoilr sixth grade class 
Describe how you might interpret individual scores in terms of 
a relevant performance, and b acceptable performance 

(10) How would you use the test results obtained in item 9 a to identify 
ineffective instruction, and b to provide useful relative information 
about student performance 

(ft) Which usability cntenon does each example below violate? 

a The test required that each student individually view pictures 
that were unavailable m original form and which, had they been 
available, would have been too cosily to reproduce in sufficient 
numbers 

b A test was written at the vocabulary level of the teacher, not that 
o( the students 

c The test went on and on and everyone started to fall asleep 
d The teacher was absent on the day of the testing and the substi- 
tute had trouble figuring out how to explain to the students how 
to take the test 

(12) How might each of the problems illustrated in item 12 be solved to 
make the test usable? 



part four/ Using Published 
Tests 



chapter twelve /Measuring 

intelligence 

or Mental Ability 


OBJECTIVES 1 Identify and explain definitions of intelligence or 
mental ability as a a genera! intellectual capac 
ity, b groups of traits, c adaptability, d what 
ever an IQ test measures, e a stnictural configu 
ration of discrete factors, f learning ability, g 
school performance h a two-level process, i a 
composite of mental abilities j a distinct charac 
lAcist.’/c ijn-vOati/v}. ajj/i A 

2 Classify verbal and figural types of intelligence 
test items 

3 Distinguish among and determine three types 
of intelligence test scores, namely mental age. 
intelligence quotient (ratio IQ), and deviation 
IQ 



4 Categorize the scores on intelligence tests in terms 
of their a reliability, b stability, and c validity 

5 Describe some commonly used intelligence or 
mental ability tests and distinguish among their 
characteristics 

6 Describe tests to measure creativity 

7 Explain the meaning and educational value of 
intelligence test scores 


THE CONCEPT OF INTELLIGENCE OR MENTAL ABILITY 


Intelligence is a concept surrounded with considerable contro 
versy To be able to place that controversy m perspective and to 
evaluate the use of intelligence tests, teachers must understand 
what this type of test measures Before moving on to the more 
recent, and more widely used, types of published tests, intelligence 
testing will be discussed Because of the general and amorphous 
nature of the intelligence concept, test publishers have begun to 
replace it with the perhaps more delimited concept of mental 
ability In this chapter, we will use both terms as they seem 
appropriate but the reader is urged to keep in mind that mental 
ability may be a less charged and more apt term in our times 

Because of the many uses of the term intelligence and the 
fact that intelligence tests are often used to predict human poten 
tial. It IS important to be as specific as possible about what the 
term intelligence (and its companion term, mental ability) means 
Y-et us consider some specAt VsiJtst 


Binet (1916 see also Varon 1935) was interested in measuring 
intelligence in order to identify children who required special 
educational treatment He took the position that intelligence was 
a general intellectual capacity made up of the following abilities 
(1) to reason and judge well, (2) to comprehend well, (3) to take 
and maintain a definite direction of thought, (4) to adapt think 
mg to the attainment of a desirable end, and (5) to be autocntical 
Although Binet defined intelligence operationally in terms of 
discrete abilities he viewed intelligence as a single but complex 
mental process He believed it could be measured by using diverse 
materials designed to evaluate integrated mental processes rather 


Intelligence 
as a General 
Intellectual 
Capacity 


313 



314 Measuring Intelligence or Mental Ability 


than be measured as separate elements ' Binet opcralionalired his 
belief by the use of a single score to represent a child’s perform- 
ance across all of the 30 different tests (or tasks) mat made up 


Box 121 


THE 1908 BINET-SIMON SCALE 


The grouping el ,iems at the appropriate ago levels Is shown below 


(1) Points to nose eyes mouth 

sentences of six syllables 
w) Repeats two digits 

(4) Enumerates objects in a picture 

(5) Gives family name 

Age 4 

( 1 ) Knows own sex 

penny) 

<4IPerce,.eswh,eh,s,helenge,„,w,o|,„,39an08cm,„,„„n 

Ages 

S Sa^r r'“ '»■’ -P - Otams, a and ,5 grams, 

(41 “ ■PoPsl'Sk at'™'" 'PP"9“'ar pieces ol cardboard, 

W Counts four coins 

(5) ’'aPaals a sentence otien a, llables 

Age 6 

G) «PPaa.sselTcVo1l““„?,LbIe^^ aar 

S^erSaTLS,;*"" “ 

capacity was present in 



The Concept of Intelligence or Mental Ability 31‘ 


(3) Chooses the prettier in each of three pairs of faces (aesthetic com- 
parison) 

(4) Defines familiar objects in terms of use 

(5) Executes three commissions 

(6) Knows own age 

(7) Knows morning and afternoon 

Age 7 

(1) Perceives what is missing in unfinished pictures 
{ 2 } Knows number of fingers on each hand an on both hands without 
counting 

(3) Copies a written model ( 'The little Paul") 

<4\ Copies a diamond 

(5) Describes presented pictures 

(6) Repeats five digits 

(7) Counts thirteen coins 

(8) Identifies by name four common coins 

Ages 

(1) Reads a passage and remembers two items 

(2) Adds up the value of five coins 

(3) Names four colors, red, yellow, blue, green 

(4) Counts backwards from 20 to 0 

(5) Writes short sentence from dictation 

(6) Gives differences between two objects 

Age 9 

(1) Knows the date day of week, day of month, month of year 

(2) Recites days of week 

(3) Makes change four cents out of twenty m playstore transaction 

(4) Gives definitions which are superior to use, familiar objects are 
employed 

(5) Reads a passage and remembers six items 

(6) Arranges five equal-appearing cubes in order of weight 

Age 10 

(1) Names the months of the year in correct order 

(2) Recognizes and names nine coins 

(3) Constructs a sentence in which three given words are used (Pans, 
fortune, gutter) 

(4) Comprehends and answers easy questions 

(5) Comprehends and answers difficult questions (Binet considered 
Item 5 to be a transitional question between ages 10 and 11 Only 
about one half of the ten-year-olds got the majority of these correct ) 



316 Measuring Intelligence or Mental AbHlly 


Age 11 

fi) Points out absurdities in slalen'etxts 

( 2 ) Conslrucls a sentence including three given words (same . 
bet 3 in age 10^ 

(3) Stales any sixty words in three mmuies 

(4) Defines abstract words (chanty, justice, kindness) 

(5) Arranges scrambled words into a meaningful sentence 


Age 12 

(1) Repeats seven digits 

(2) Gives three rhymes to a word (in one minute) 

(3) Repeats a sentence ot twenty six syllables 

(4) Answers problem questions 

(5) Interprets pictures (as contrasted with simple description) 


Age 13 

(1) Draws the design made by cutting a triangular piece from the once- 
folded edge of a quarto folded piece of paper 

(2) Rearranges m imagination the relationship of two reversed triangles 
and draws results 

(3) Gives differences between pair of abstract terms pride and pre- 
tension 


(It IS interesting to speculate on how many of these age-graded tasks from 
1908 could be passed by 75% of todays children of those respective 
ages More than likely today s A-year olds could name objects like a key, 
knife or penny but how many 13 year-olds today could distinguish 
between the terms pride and pretension? Have we become less 
bright or is it simply a matter of changing cultures? Such differences in 
standards certainly point up the importance of constantly updating tests ) 

From Binet A and Simon T Le D^veloppement de 1 intelligence chez les 
enfants L Annio Psychologlque 1908 14 1-94 


The position that intelligence is largely a general intellectual 
capacity was championed Charles Spearman who believed that 
all mental activity is dependent pnmanly upon and is an expres- 
sion of a common or shared general factor (Spearman, 1927) He 
called this tactor g and characterized it as mental energy that is 
possessed by all individuals but m varying degrees, and that 
operates m all mental tasks as a function of the demand they place 




The Concept of Intelligence or Mental Ability 317 


IntercorreJations among Intelligence Test Subtest Scores* 


1 


(1) Analogies 



(2) Completion 

50 

(3) Understanding 


Paragraphs 

49 

(4) Opposites 

55 

(5) Instructions 

49 

(6) Resemblances 

45 

(7) Inferences 

45 


• From Spearman. C The Ability 


2 3 4 5 6 


54 

— 




47 

49 

_ 



50 

39 

41 

— 


38 

44 

32 

32 

— 

34 

35 

35 

40 

35 

Mart. 

NY 

MaenuUan 

, 1927 

. P 149 


upon mtelligence He based the existence of g on the interrelated 
ness of performance on each of an intelhgence test's subtests 
(Interrelatedness is traditionalJy expressed in terms of a correJa 
tion coefficient that can range from —100 to 100 with zero 
indicating no relationship ) Spearman (1927) obtained the pat 
tern of intercorrelations shown in Figure 12 1, which indicates 
that all the subtests have something m common in terms of 
performance 

While the mtercorrelations between subtest scores are high 
but not perfect, Spearman postulated the existence of specific 
factors, called s, specific to particular types of activity Thus, all 
mental activities were seen as having a general component, reflect 
mg general intellectual capaaty, and a specific component, some- 
what unique to the activity itself, with the genera! one being the 
more important It is this belief that justifies the practice of add 
mg the Items correctly passed m the various types of test activities 
to provide a smgle score that is then used to represent an indi 
vidual's general intelhgence level 

Given the kinds of interrelationships among subtests illus 
trated m Figure 12 1, one can either be impressed with the degree 
of overlap or the degree of nonoverlap The latter emphasis gave 
rise to a more trait-onented view of intelligence 


Intelhgence has also been viewed as a combination of groups of 
traits or factors Each of the traits within a group has more in 
common with other traits within that group than with traits out 
side the group Thurstone (1938, 1943) c^Jed this group-factor 


Rgure 1^.1 
7 


Intelligence as 
Groups of Traits 



318 Measuring Intelligence or Mental Ability 


Intelligence as 
Adaptability 


theory and identified the following pnmary mental abilities as 
group factors 

(1) The Number factor (N) ability to do numerical calculations 
rapidly and accurately 

(2) The Verbal factor (V) found m tests involving verbal com 
prehension 

(3) The Space factor (S) involved in any tasks m which the sub 
ject manipulates an object iraaginally m space 

(4) The Word Fluency factor (W) involved whenever the subject 

isolated words at a rapid rate 

sii'ie^rrT® <1'= 
groips or,«::r'' “ - 

SIX pnma^ fsctOK^ls' rdat’ed^'t^D"rf Thurstones 

others suggesting that iw!.. PsrfonuMce on each of the 
abstract imelhgeLe as hypothesSd “““P* “f 

sonable to conclude th^ m Spearman It seems rea 

defined as a collection of 

general unifying factor In othe? wn ^ ® 

to have some validity points of view seem 

.ors^^rci'‘ir:thSfoSx‘’r ’’rf ™ f- 

ever most current intelligence (or^menTV'n'r'" ?PP‘'°'*'='> H°w 
vide a single overall score^usuallv I n ‘‘'’‘'‘'J') measures pro- 
endorsemen, a, leas, opem, ^ ^ -8Bes,ran 

lonally of the general factor approach 

Wechsler (1944 „ .^c , 

^obal capacity of the ,nSjd°to®™,“ “ “Sgregute or 

mes ftat are VrtcTeS-iy U) ^^‘''’/'“V^Slrtatlcnv 




one 



The Concept of Intelligence or Mental Ability 


would be hard pressed to argue whether the elements m either 
definition are measured, or can be measured, in an intelligence 
test While it may be useful to think of intelligent people as being 
adaptable (although notable exceptions can be found), the evi 
dence for such adaptability would have to come from a reasonably 
long term examination of a person’s behavior and not from a test 
If a concept is reasonable but not measurable, its usefulness in 
education is limited 


Boring (1923) suggested that the best definition of intelligence is 
whatever the test measures Certainly, this definition is opera 
tional, although it has some notable failings First of all, many 
different tests are available for measuring intelligence (as we shall 
see) Which of these will constitute the standard for measuring 
intelligence’ To use this definition we must settle on one test 
since different tests may yield different results But how is that 
one test to be developed itself without a definition to guide its 
development’ In addition, current concern over cultural biases in 




Intelligence 
as Whatever an 
Intelligence Test 
Measures 







The Concept of Intelligence or Mental Ability 


321 


IS to detect a relation and apply it. for example, the operation 
would be C (Cognition) Products refers to the kind of outcomes 
to be produced In the model, for example, units refers to discrete 
outcomes such as synonyms 

Guilford’s model produces 120 cells (4X5X6) of possible dis 
Crete intelligence factors * It suggests types of tests that have 
hitherto never been invented It also portrays intelligence, a highly 
complex phenomenon, as a combination of small pieces that con 
ceivably fit together to make a whole Not only is the approach 
quite helpful in understanding intelligence, it is also helpful m 
suggesting types of measurement procedures educators need 


Intelligence tests have been criticized on the grounds that they 
measure what a person has learned, not what he or she can learn, 
and (it IS argued) the two are distinctly different If the purpose 
of intelligence testing is to make decisions about future learning 
experiences, the value of the tests may be greatest if they can teU 
us how likely a person is to learn given the proper circumstances 
Piaget (1950, 1952), for example, defines intelligence in terms of 
assimilation and accommodation Assimtlatton refers to changes 
that are made in what is taken into the mmd in order to fit it 
mto one’s scheme of things while accommodation refers to 
changes that occur m one's own internal structures as a result of 
new experiences These two processes occur together to enable a 

person to learn and grow , , . u. 

If intelligence were to be defined as learning ability, it inight 
be measured in terms of ones ability to improve as a result of 
instruction Feuerstem (1968) has developed a procedure wherein 
he first attempts to teach a child something and then measures 
how well he or she has learned it If that which is to be learned i^ 
unfamiliar then the ease with which a child learns it would be 
indicative of his or her learning ability (Cours^ or bool« 

designed to prepare students for such tests as the College Boards 
suggest the application of this definition of intelligence ) To util 
ize this definition of intelligence would require a dramatic cha g 


Intelbgence 
as Learmng 
Ability 


2 After hearing the iStle 

claimed If I were the Almighty I wo^dnl ^ 

boxes Whereupon another quippe^ comments suggest the 

wouldnt have stacked them ^ such a systematic organ 

difficulty of thinkmg as a collection of less organ 

ization of elements rather than as a totaiiiy 
ized parts 



in 


Measuring Intelligence or Mental Ability 


m current mnjor approaches to mtclUgence testing Rather than 
attempting to find out what a ch.ld knows or can do, we would 
first have to teach tt to him or her so that we could determine how 
easily it was learned 


Intelbgence 
as School 
Performance 


Many people consider school performance to be the application of 
intelligence and hence define mlelUgence in terms of school sue 
cess The high correlations that have been obtained between intel- 
ligence test scores and school performance reinforce such a view 
by suggesting that school performance vs a manifestation of intel- 
ligence However, we know that correlation docs not necessarily 
imply causation that is. the fact that intelligence and school 
achievement are correlated does not necessarily mean that high 
mtelUgence causes good school performance * 

There are instances when, m fact, we might conclude the 
reverse to be true Rosenthal and Jacobson (1965) showed that 
teachers expectations could affect students' scores on intelligence 
tests presumably by improving their performance m school (They 
demonstrate this in a way that did suggest causation ) Students 
who perform m school as they are expected to may come to regard 
themselves as bnght (since others view them that way) and sub 
sequently perform bnghily — m class as well as on IQ tests For 
this reason and because of the other factors that affect school 
performance (such as hunger, fatigue, dissinterest) it is wise not 
to regard intelligence and school performance as synoymous Both 
probably reflect the kinds of prior experiences a person has had, 
others expectations for him or her, and to a high degree, a per 
son s self expectations this overlap in causality accounts in part 
for the high mtercorrelaiion 


Intelligence Jensen (1968) hypothesizes that intelligence js a two level process 
as a Two-level with Level I being associative intelligence and Level II being 
Process abstract mtelhgence The first level refers to those kmds of tests 
mat rely on memory and the building of simple associations 
Gagne (1965) separates the association process into (1) signal 
learning, (2) stimulus-response learning, (3) chaining, and (4) 
verbal association The second level refers to thinking and prob 





The Concept of Intelligence or Mental Ability 323 


lem solving skills, which Gagne subdivides into (5) multiple dis 
cnmmation, (6) concept learning, (7) pnncipJe learning, and 
(8) problem solving 

Most intelligence tests would seem to be mixtures of these 
two processes with some tasks relying principally on association 
(e g , identifying and recalling things that go together) and some 
relying on abstraction (eg, finding common elements) For the 
most part however, the more common emphasis is on abstraction, 
thereby penalizing those with associative but not abstract 
intelligence 


Considering all of these definitions of intelligence an attempt will 
be made here to create some reasonable composite for educa 
tional purposes It would seem mtelhgence might be defined as a 
composite of intellectual skills or mental abilities that can be 
specified in detail based on the tests themselves and that are in 
fluenced by and related to a learning environment designed to 
reinforce or require these skills 

The above definition has two parts The first part specifies 
the kinds of performances that are measured by intelligence or 
mental ability tests, these being areas such as 


Intelligence 
as a Composite 
of Mental 
Abilities 


inductive reasoning 

verbal comprehension and fluency 

spatial relations 

numerical skills 

figure comprehension 


Note the wide range of these skill categories m covering the range 
of mtelTectuaf performance or mental' abii'ity 

The second part of the definition relates m part to where 
intelligence comes from and in part to where it leads According 
to the definition intelligence is “more than a blessing and less 
than a blessing “ It is derived in part by experiencing an environ 
ment — both home and school' — that reinforces it (i e , its use or 
manifestation is rewarded), and it can be successfully applied 
in an environment that requires its application for success (of 
which school IS the prime instance) 

The composite definition gives us a way of thinking of mtel 
ligence in terms of many of the properties that have already been 
discussed under the preceding definitions For the remainder of 
the chapter it will be helpful to think of intelligence as a limited 
cluster of mental abilities with both a source and a function 



324 


Measuring Intelligence or Mental Ability 


Intelligence 
as a Form of 
Ability, Aptitude, 
and Achievement 


ig;^HE:=as=' 

Usmg a ccmposue of a a accapled by all 

intelligence descnbes a somewhat™'^' have to say that 

“pMude following as the definition of an 

ative^’of an inXIdMl"' abib’iy'io '"’aractenslies indic 
ciflc knowledge sM of^t of reZ"'' 'P= 

speak a language to become a mu/ *'*'*’ “ 'h® “hility to 
ale An aptitude test therefore ’"''''anical work, 

person s potenlml abihly for 'oerlo “’'signed to indicate a 
-'■v.-yofespeeictiredkfni:';^^, a certmn type of 

Wh- such aptitude tests me ' ’ 

Hiiwever^mt^h 

=r«-sSS~i=^^^S 

Hence i„,el| ™ ‘“•'"■gence" 

onlybecoSJLedt r”''"'' ''''= «ud=nt*l ‘’“'“ty and 
Lbiev ' 

-nram't;." - .he next 



Test Items 325 


extent on the detection of pnor learning Thus at the level of 
definition, achievement may be no easier to separate from mtel 
ligence than were aptitude and ability In the final analysis we 
will have to examine the tests themselves and the uses to which 
they are put to determine the nature of the differences 

It IS important to reinforce the point that intelligence tests and 
achievement tests have different purposes Achievement tests are 
used to assess what a student has learned while intelligence or 
mental ability tests are used to predict what a student can learn 
That IS to say that achievement tests are edumetric and intelli 
gence tests psychometric — as delineated in Box 8 1 on page 210 
Consequently, in intelligence testing, the content of the items is not 
as important as their predictive validity, that is, their value for edu 
rational decision making However, the success of this distinction 
as an operational fact is limited in the eye of some An examma 
tion of test Items and test properties may help clarify this point 


TEST ITEMS TO MEASURE INTELLIGEXCE 
OR MENTAL ABILITY 


What mtelhgence or mental ability tests measure may be better 
understood by examming some of the kinds of test items (that is, 
the kinds of performances) utilized in them The items used in a 
test serve as the best operational defiration of intelligence as used 
withm that test After examining a test’s items, you can determine 
for yourself what the test seems to be measuring 

The kinds of items used for illustrations on the next page are 
ones that one might find primarily on group administered intelli 
gence tests No attempt has been made to lUustrate eveiy van 
ation— just the mam types Both verbal and figural item types are 
shown They are based on Guilford (1967) 

‘ These Items are not taken directly *">”■ Sr'cSen^h^ra're 

abibty tests and are not testtdVy both individual and 

intended to illustrate some of the proc^ses t ^ j Other types of 

gioup tests Only verbal and fi^tal it.^ 

Items such as memory items have not numbers— 

for instance the test taker “ Siven a bst-ni^ly^o^^ 

and asked thereafter to numSrs or pictures to examine 

he or she will be given a list nf those on the original list 

and then given a second list on the second list that 

and asked to recognize or identify all ot uie stuuuu 

also appeared on the first list 



324 

Intelligence 
as a Form of 
Ability, Aptitude, 
and Achievement 


Measuring Intelligence or Mental Ability 


Sr p^p'e ab™. 

Some attempt to danfy their “bihty, and achievement. 

would seem m order ^ ‘l'<f‘=rences in meaning at this point 

easily de&ed nOT^iri^TiIglTdetoT P^^ceding pages, is not 
Using a composite of the cfcfinihfr accepted by all. 

mtelhgenee descnbes a somt^a ° ^a/ that 
are commonly referred to as of mental traits that 

mtelhgenee and mental abdity ":'„t '“ ? 

(with more current usage tendme^ru, ^ j i.'‘®f “'Ofobangeably 
Freeman (1955) offers the fd! 'alter) 

muude ""a following as the definition of an 

Characteristics indic- 

cific knowledge, skill, or set of re ^ ‘™”>ng some spe- 
speak a language to becomi fmusr^"' ability to 

etc An aptitude test, therefore is a del"' '1'*° “C'ltaaical work, 
persons pctential ability to, designed to indicate a 

cctw,,yofuspec,d.redAmd;X?™"’’'“ of u certain type o1 
™cn such aptitude test range 

^enS's S°ri' ^Pa- cetait'. mllCfe 

However linee In accuracy, they are cons^d a ‘'““amg. and 
tests oftln “ 11 '"cbal reasomng “g aPftude tests 

subtests have hem'Z j *"!' mtelhgenee"' 

S ^ (Ills acad™,. .p„ 

a.trrfe'a.s£?=~i:; 

Tests of lehiev, ““n essentially 

« - nest 

"" Of '^rsjri^d^SlnfiS 



Test Items 325 


extent on the detection of pnor learning Thus, at the level of 
definition, achievement may be no easier to separate from intel 
ligence than were aptitude and ability In the final analysis we 
will have to examine the tests themselves and the uses to which 
they are put to determine the nature of the differences 

It IS important to reinforce the point that intelligence tests and 
achievement tests have different purposes Achievement tests are 
used to assess what a student has learned while intelligence or 
mental ability tests are used to predict what a student can learn 
That is to say that achievement tests are edumetnc and intelli 
gence tests psychometric — ^as delineated in Box 81 on page 210 
Consequently in intelligence testing, the content of the items is not 
as important as their predictive validity, that is, their value for edu 
cational decision making However, the success of this distinction 
as an operational fact is limited in the eye of some An examma 
tion of test Items and test properties may help clarify this point 


TEST ITEMS TO MEASURE II^ELLIGEXCE 
OR MENTAL ABILITY 

What intelligence or mental ability tests measure may be better 
understood by examining some of the kinds of test items (that is, 
the kinds of performances) utilized in them The items used in a 
test serve as the best operational defimtion of intelligence as used 
Within that test After examining a test’s items, you can determine 
for yourself what the test seems to be measuring 

The kinds of items used for illustrations on the next page are 
ones that one might find primarily on group administered intelli 
gence tests No attempt has been made to illustrate every van 
ation — just the main types Both verbal and figural item types are 
shown They are based on Guilford ( 1967) * 

* These items are not taken directly from published intelligence or mental 
ability tests and are not written at a level intended for children They are 
intended to illustrate some of the processes tested by both individual and 
group tests Only verbal and figural items are covered Other types of 
Items such as memory items have not been illustrated In memory items 
for instance the test taker is given a list — usually of words or numbers— 
and asked thereafter to recall as many of them as he or she can Or else 
he or she will be given a list of words numbers or pictures to examine 
and then given a second list containing some of those on the original list 
and ask^ to recognize or identify all of the stimuli on the second list that 
also appeared on the first list 



326 Measunog Intelligence or Mental Ability 


Verballienis Examples 
Word Substitution 

* Which of the words below is the best substitute for the italicized 
word m the following sentence"^ 

He was a good doctor but alcohol was his nun 

a plague c fate 

b undoing d destiny 

Synonyms 

• Which word means the same as the given 

TEMPERAMENT 

a angnness c hostility 

b popularity d disposition 

Word ClassiBcation 
• Which word does not belong^ 

a horse c mosquito 

b Qouer d snake 

Verbal Analysis 

• Which word should go in the blank space to fulhll relationships 
that call for it’ 

com HOT UP 

a down c low 

b high d under 

Word Class 

• Into which one of the four classes does the given word best fit’ 
PALM 

a plant c tree 

b aov.er d leaf 

Verbal Relations 

’ TfAe the relanon 

BIRD SO\G 

a fish water c pianist piano 

b person speech d horse ranch 



Figural Items Examples 
Recognition of Objects 
• What IS the object^ 


Test Items 327 



Figure Matching 

• Which alternative (at the right) is most nearly like the test 
object (at the left)"^ 



A B C D E 


Figural Relations 

• What kind of figure should appear m the cell with the question 
mark’ 



ABODE 




326 Measunng Intelligence or Mental AbUvty 


Verbal Items Examples 
Word Substitution 

• Which o£ the words below is the best substitute for the italicized 
word m the following sentence’ 

He was a good doctor, but alcohol was his rant 

a plague c fate 

b undoing d destiny 

Synonyms 

• Which word means the same as the given’ 

TE^tPERAMENT 

a angnness c hostility 

b populanty d disposition 

Word Classification 
• Which word does not belong’ 

a horse c mosquito 

fa flower d snake 

Verbal Analysis 

• Which word should go m the blank space to fulfill relationships 
that call for u’ 

COUJ HOT UP 

a down c low 

fa high d under 

Word Class 

• Into which one of the four classes does the given word best fit’ 
PALM 

a plant c tree 

fa flower d leaf 

Verbal Relations 

• Which aUcroative pair comes nearest to expressing the relation 
ot the given pair’ 

BIRD SOSG 
a fish water 
fa person speech 


c pianist piano 
d horse ranch 



Test Items 327 


Figural Items: Examples 
Recognition of Objects 
• What IS the object? 



Figure Matching 

• Which alternative (at the right) is most nearly like the test 
object (at the left) ? 





] D ^ 

)( 


Figural Relations 

• What kind of figure should appear in the cell with the question 
mark? , 



# 

o 

o 




□ 




▲ 


? 


A 

□ 

n 

■ 

c 

A 

0 

O 

E 





3Z8 Measunng IntelJigence or Mental AbUlty 


Spana\ Visua\^ation 

• Diagrams I and II show two steps m folding a square piece of 
paper and cutting a notch in a certain location Which alterna- 
tive shows how the paper would look when unfolded"^ 


_J t_h 

f II 



ABODE 


Hidden Figures 

* Which of the five simple figures at the top is concealed in each 
of the Item figures'^ 



Identical Figures 

• m.ch figure m the rov, ,s exactly the same ax the one at the 






Types of Scores 329 


Recognition of Figural Classes 

• Which figure does not belong to the class determined by the 
other three figures? 



B C 


TYPES OP SmUlH 

In this section we will consider how intelligence tests ore used in 
LrJns of thTty^^^^ of test scoresi mental age, intelligence quotient, 

and deviation IQ. 

,, , that IS determined by comparing a child's MtnUtAvM 

Memal age is a ‘ her age-maus and ivii/i ,he 

score with the older children m the mrmmg 

scores obtained by ^ge of 5-0, for example, would 

group. A score equa g , same as the average score 

indicate that the score was although the child 


obtained by a sample group 


by a sample g older or younger than five, 
obtaining the intelligence or menial abilily UU^ 

Describing perfor helpful in mterprclalfon, 

of mental ag aees should nof hr. f.s<„ 


Describing pertor^ intcrprcLilPvnr 

m terms of mental mental ages should not be Inter- 

However, like grade q average pcrlormcn-^, 

preted literally because „fmn relate educa,;;^ 

dTcifio“ m rchiidfage tLugh, the idea of being able crp,.„s 

. commonly referred to, it is not the moit 
- While this term may be comn j intelligence or menial 4.V ' -y 
reported or ■nWrewJ £ ™ /lUgence tests f 

Its major use is in maiv 


Answers . j „ 

^ . 3 2—d.d,c 

b, d. b. a, c. b. airplane, c. a, c. 1 ■ 




330 


Measuring Iniclllgeiice or Mental AWlUy 


mtell.gencc .n terms of muiul ttge, that ts, ihc age oj 
tainutg that same score on the axerage. prouJes ihe ktnd;> of num 
bers we arc accustomed lo using and mlcrpri.ting In other worus, 
age groups arc meaningful reference groups for cducalors and so 
referenemg intellectual funcuomng m terms of the average per- 
formance of age groups can be helpful when used in conjunction 
with other kinds of information 

Since children normally develop rapidly, the difference m 
performance between a child on his or her birthday and another 
child 11 months later (although icchmcally both arc still the same 
age) will be considerable To deal with this problem, WJx s arc 
expressed in a way that reflects both years and months of age 
Because the 12 month year dots not lend itself lo the use of 
decimals, M A. 's arc expressed as shown below * 


5 years, 0 months 

5-0 

5 years, 2 months 

5-2 

5 years, 4 months 

5 4 

5 years, 6 months 

5-6 

5 years, Smooths 

5-8 

5 years, 10 months 

510 


Normally, the test publisher idennries the test score and age of 
each child in the normmg group or standardization sample The 
average score for each age group m the normmg sample is deter- 
mined The mental age for a particular score, then, is the chrono- 
logical age of children in the normmg group for whom that score 
was the average score These mental age "cquitalcnls” are shown 
in a norms table in the test manual 

Such a table vs shown vn Figure 12 3 as taken from the manual 
of the Peabody Picture Vocabulary Test (Dunn. 1965) Once a 
child s score is determined, the scorer looks up that score in the 
table to discover the age of children for whom that score was the 
average That score, expressed m years and months, represents the 
tested child s mental age for example, a child who obtained a raw 
score of 91 on Form A of the PPVT (as shown m Figure 123) 
would have performed as a 13 year. 2 month old person, that being 
riie^average age of people in the norming group who earned a score 



„n„ Raw scores on the Peabody Picture 

Norms Table for „ Forms A and B* 

Vocabulary Test to Mental Ages Forms A 

, Uovd M Dunn Ph D 

•Reproduced with permission of the u 


Figure 12 3 


332 


Measuring IntelUgence or Mental Ability 


Intelligence Although *e mitials "IQ" have come to be synonymous with intel- 

in very specifically for a g,wt,e,U or 

(Ratio IQ) ratto reprereoUng inulhgence computed by dividing mental age 

ptymgTy iW age (C.A.) and multi- 

IQ=MA/CAX100 

to the av rige o^yonter chUH ”“n' 

"below averfge ■• nSiv f • «™'d be under 100 or 

began petforLg aW o"r' beir-fv - Siveu age 

- he ah^ove'o tW 


Box 12.2 


IQ 

Above 140 

120-140 

110-119 

90-109 

80-89 

70-79 

Below 70 


STANcnDS^®®"'"=*TION OF IQ'S- 
STANFORD-BINET SCALE, 191s 


Classification 
■■Near" genius or genius 
Very superior intelligence 

Superior intelligence 

BomenmedeOciency 

Def,nuefeeblem,naedness 


1919 (reptodoc, 



Types of Scores 333 


children of the same age who have taken the test. Above average 
simply means better than one's age-mates. (A classification of 
different ratio IQ scores that was used in the early days ot intel- 
ligence testing appears in Box 12 2.) 

Although it was used for many years as a mental ability score 
(and to some extent is being used today) the 
poses certain problems. The primary probleni occurs in 
urement of intelligence among adolescents and a u s. , j 

approaches the performance limit of the test his 
age tends to stabilize. Naturally, as the f 

increases, the IQ ratio will ‘"^rxhus apem™ 

the same while the denominator, CA , nIHpr while his 

will seen, to get less intelligent as he or she 

or her intelligence is probably staying t ^ sam . Stanford- 

this, the maximum C.A. of id was 

Binet, a limitation in using this ratio ™ intelligence quotient 
To compensate for shortcomings in the intelligence q 

score, the so-called deviation IQ was developed. 


The deviation IQ (DIQ) is a ^'^"^"fj‘f°g'",hTstandard deviation 
and a constant value (usually ,.J L j ^^an and standard 
(that is, variability) °f groups and across levels 
deviation would be constant ac ® ,„„e score for each age 
of a single test. Briefly, the mean at the average level 

group is determined. Any studen receives the score of 100 

for his or her age group jausly vary from age group to 

even though the raw score wil ewressed as the standard 

age group. The mnds to depart form the aver- 

deviation (a measure of the a proup. 

age), is also determined for each g g ^ 

• A second hmilalion in .he -“fSSuo S's 
the fact that the variability of ^ .armbiUty “ P'fS.Tth „me 

same In other words, on » '“'(““tove average “ ^ from ddfer- 

to earn a higher score to be . w compare ratio s , |jje 

variability. This fact makes ‘t s«e of a con- 

ent intelligence tests ® Also, the variaWity Lj a person’s 

degree of variability /on the same test, lus or 

Slant for different age groups on u older even inougi 

LrSonSfve to af^mates stayed the same 
•Refer to the discussion li'scoaimoa 

ceding chapter. The ‘eW'^cS^e does not yield a 9«ohe«r. 

determination as a standard score u 


Deviation IQ 



334 


Measuring Intelligence or Mental Ability 


If a student receives a score that is exactly one standard devi- 
ation (the “average' amount of variability) above the mean, the 
tZtL IQ score would be 115 or 116 If his or her score is 
one half a standard deviation above the mean, the DIQ "'0“''^ 

108 two standard deviations above, 130 or 132, and so on A sc 
of 84 or 85 would mean that the person's DIQ fell below the aver- 
age of his or her age mates by an amount equal to about the aver- 
age amount of variability m scores “ 

Before you interpret an IQ score, it is important to know 
whether it is a ratio IQ score or a deviation IQ score because the 
two do not parallel one another The commoufy used mtelUgeuce 
or mental ability tests (described m a subsequent section) all use 
the DIQ score Intelligence test scores may also be expressed as 
percentile ranks >* (Percentile ranks were discussed on pages 285- 


89) 

A form for reporting scores on the Short form Test of Aca 
demic Aptitude is shown in figure 12 4 Raw scores arc presented 
for each subtest and raw scores, mental ages, deviation IQ scores, 
percentiles, and standard scores (called reference scale scores), 
are presented for language, nonlanguage, and total A table for 
converting raw scores to deviation IQ scores on the Short-form 
Test of Academic Aptitude is shown for illustratiie purposes m 
Figure 125 and a table for converting raw scores on this test to 
mental age scores is shown m Figure 12 6 If you look up the total 
raw score reported on the form m Figure 124 (67) m the table 
shown m Figure 12 5 you will see how the total DIQ score was 
obtained (all relevant values from figure 12 4 are circled in Figure 
12 5) Similarly Figure 12 6 was used to convert the raw scores on 
the form to their mental age equivalents 


PROPERTIES OF TESTS 


In ihis section we will discuss three properties of tests reliability, 
stability and validity, as they apply to intelligence or mental ability 


'« To interpret a person s relauve position in his or her age Broun as a func 
h°e™'„“S?S devurnons he or Sie SKmem bSSw 

incmcan it hiII be useful to refer to Figure U1 Qnnapp?at 








336 \icasuiing InlfeUigeuce or ^teltaI Ability 


Flaura 12.5 /I TMe for Converting Raw Scores to Deviation IQ Scores on the 
Short-lorm Test cl Academic Aptitude, Level 3.‘ 



Properties of Tests 337 


A Table for Converting Raw Scores to Mental Age Scores on the 
Short-form Test of Academic Aptitude, Level 3 * 


iuiv 

SCORE 

MENTAL 

ACE 

TOTAL 

46 

11-01 

47 

11-03 

48 

11-04 

49 

11-06 

SO 

11-08 

51 

11-09 ^ 

52 

11 11 

S3 

12-00 

54 

12-02 

55 

124>4 


12-05 

S7„ 

12-07 _ 

58 

12-09 


12 10 


134X) 


1302 


1304 


1306 


13-08 

65 

1310 

ci) 

04®) 




1407 


“ 14-10 

71 

1501 

72 

1504 


1508 

74 

1600 

75 

1605 


16-10 


1705 


18-00 

79 

1800 

80 


81 


82 


83 


84 


85 






ui K<»t- PTB/McGraw Hill Monterey 
‘Reproduced by P™‘“‘™(?reled “me from Figure 124 
1970 by McGraw Hill Inc Circled va 


Figure 12 6 


CA 93940 Copynght © 





S5& Measuring Intelligence or Mental AbHlly 


RellablUtv All the major publishers intelligence or mental ability tests cur 
Reliability ^ reliability whether 

computed as the relationship among Hems of a test (vntemal con 
sistency) or between alternate forms of a test Correlations repre 
scntmg internal consistency (that is the relationship among 
Items) generally average about 90 Except for younger ch) oren 

correlations between alternate forms run at about 80 (See discus 

Sion if stability below ) While both correlations are high neither 
IS perfect This tells us that performance on an IQ test is influ 
enced by factors other than the test taker s actual level of mtelli 
gence Chance is one such factor However m terms of typical test 
cnlcna intelligence tests clearly attain a level of reliability that is 
satisfactory 


Stability 


As a measure of potential a person s intelligence or mental ability 
can be expected to remain at about the same level throughout most 
of his or her life Consequently the scores on intelligence or 
mental ability tests should not only be reliable that is consistent 
oscr short time spans but stable that is consistent over longer 
time spans When you are 20 for example will you test out at the 
same level as you did when you were 10’ If intelligence tests 
measure a basic charactenstic of people we would expect a reason 
ably high degree of stability of IQ test scores 

How are we to determine whether IQ test scores are stable or 
not’ The most obvious way would be to get a sample group of 
children retest them at periodic intervals and correlate the scores 
from one testing to the next Bloom (1964) did just this and 
obtained a correlation of 0 80 between Stanford Binet deviation 


10 scores of students at grade 3 and again at grade 12 Such a high 
correlation suggests a high degree of stability However the follow 
mg must be considered A high correlation indicates a high predic 
t/ic rehiionship buc does not necessarily suggest constancy Aver 
age IQ s from grade 3 to 12 could have increased 30 points but if 
cvcryonc s increase was proportional a high correlation would 
resu t ore specifically the world changes m nine years A gen 
cral increase m the availability and quality of education over this 
period might enhance the IQsof most if not all students dramati 
and thereby rtducc the constancy but not the correlation If 
'mprovement was vastly uneven causing only 
liSJlot correlation would have been substan 



Properties of Tests 339 


Hopkins and Bracht (1971) started with first graders and 
followed them through high school They gave them the California 
Test of Mental Maturity m grades 1, 2, and 4 and the Lorge 
Thorndike Intelligence Tests m grades 7, 9, and 11 They found 
that first grade IQ’s were not that highly predictive of subsequent 
IQ s, even of those in the second grade However, fourth grade 
verba! IQ s (that is, those scores based only on language portions 
of the test) correlated 77 with eleventh grade verbal IQ’s, indicat 
ing a high degree of stability starting from grade 4 

In interpreting scores on intelligence or mental ability tests, 
we must keep certain points in mind First, it is not appropriate to 
generalize from one IQ test to another Second, IQ data on young 
children have reasonably little stability Third, that part of the 
IQ based on language portions of the test is considerably more 
stable than that part based on nonlanguage portions Fourth, 
while studies of IQ stability have been based primarily on indi 
vidually administered IQ tests, the tests more frequently used m 
education are group administered tests for which less information 
about stability of scores is available 

We will return to this issue at the conclusion of this chapter 
in the discussion of IQ tests in relation to educational theory and 
practice 


Estimates of the predictive validity of intelligence tests are based 
largely on the relationship between test scores and (1) success m 
school, and (2) level of occupation entered The issue of biases in 
the tests that affect their validity will be discussed in the last 
section of this chapter 

Success in School. There is considerable evidence that scores 
on IQ tests are related to school performance This does not mean 
that a student's measured IQ determines how well he or she will 
do in school, it does indicate that the skills needed to do well in 
school are very much like the skills needed to do well on IQ tests 
The relationship between a particular 10 test and school achieve 
ment is usually presented in the manual of that test Some samples 
of these results will be presented The Cognitive Abilities Test in 
dudes verbal, nonverbal,** and quanutative batteries of cognitive 

>2Nole that in various IQ and mental ability tests the part of the test that 
does not rely on reading of words is alternatively referred to as non 
language or figural 


Validity 



J40 Measuring Intelligence or Mental AWUty 


or mental ability The Iowa Tests of Basic Skills (ITBS) are c(OT- 
monly used achievement tests for the elementary grades They 
include tests for vocabulary, reathng, language, study skills, and 
arithmetic, each with its own subtests The Test of Academic 
Progress (TAP) are standardi 2 ed achievement tests used at the 
high school level and include tests of social studies, composition, 
science, reading, mathematics, and literature The correlations be 
tween Cognitive Abilities Test scores and composite scores on those 
school achievement tests are illustrated m Figure 127 For the 
verbal scores, the correlations range from a high of 80 m third 
grade to a low of 72 in ninth grade For nonverbal scores, the 
correlations remain at about 60 throughout the grade levels For 
quantitative scores, correlations stay at approximately 70 The 
correlations are substantial, particularly for verbal and quantita- 
tive ability This seems to say that mental ability or intelligence 
tests and school achievement tests are tapping common or related 
skills and that students of high intelligence (as measured by this 
test) tend toward better school achievement (as measured by 
these tests) This conclusion is least accurate in the nonverbal 
area, an area of less emphasis m the school cumculum 


nquto 12.7 Average Corre/ailons between Verbsf, Nonverbal, and Quantitative 
Scores Oft the Cognitive Abiliuea Test and Composite Scores on the 
tawa Tests of fias/c SWts and Tests of Academic Progress * 


COGNITIVE ABILIHES TEST 


Grado Level 

2 

4 

5 

ITBS 6 

7 
S 

9 

TAP 10 

t1 
12 


Verbal 

r^onverOaf 

Quantftativa 

60 

59 

68 

75 

57 

63 

79 

60 

70 

78 

58 

70 

76 

60 

73 

78 

61 

71 

72 

^9 

71 

73 

59 

70 

73 

58 

69 

73 

61 

68 


”” R'P'-J-ced by po™,ss,on of .he Hob,h.on 



342 Measuring Intelligence or Moit^ Ability 


address of the test publisher, the test’s costs, and often one or two 
cntvcal reviews (Test publisher's catalogs are also a useful source 
of up to-date information on test availability and costs ) 


Individual Individual intelligence tests are administered to children on a 
Intelligence one to one basis by persons trained in test administration Such 
Tests tests are often given to children with presumed learning disabili- 
ties who have been referred to the school psychologist or learning 
disabilities specialist and for whom group testing would be diffi- 
cult, perhaps untimely, and probably not sensitive enough Al- 



Some Specific Tests 343 


though teachers do not have to know how to administer such 
tests, upon occasion they have to interpret the scores Scores may 
be available for classes of students or for those having difficulties 
or considered gifted Testing may be done to make decisions about 
school entrance, grade placement, rapid advancement, or to detect 
various forms of impairment or maladjustments Scores on such 
tests are usually reported as deviation IQ's and mental ages (see 
pages 329-35) 

Three widely used individual intelligence tests are the 
Stanford Binet Intelligence Test, Wechsler Intelligence Scale for 
Children (WISC), and Peabody Picture Vocabulary Test (PPVT) 

Stanford'Binet. This intelligence scale measures the general 
mental ability of individuals from age 2 to adult, although it tends 
to be used primarily with children from ages 2 to S It is a scale 
based on age standards of performance and attempts to measure 
how high up the scale a child can go before the tasks become too 
difficult A single revised form (L M) is available with norms based 
on data as recent as 1972 It is published by Houghton Mifflin Co 

The test measures skills in seven content categories Sattler 
(1965) has labeled them as follows 

• language Naming objects m pictures, defining words nam 
ing rhyming words 

• reasoning Drawing an orientation, pointing out why a 
verbal statement is absurd 

• memory Remembering sentences, remembering digits 

• conceptual Explaining a proverb, indicating a basis of 
similarity 

• social intelligence Understanding social identities and re 
lationships finding absurdities in pictures 

• numercal reasoning Making change, ingenuity m solving 
a math problem 

• visual motor Making a form on a form board, copying a 
square 

Stimuli used include words, objects and pictures Responses are 
speaking, drawing, calculating, writing or other motor acts 

The scale covers a wide number of levels with six subtests at 
each age The examiner starts the child on tasks below his or her 
chronological age and then moves the child upward until he or 
she reaches a level at which he or she fails all the subtests About 
an hour is required for a typical administration Intelligence is 
expressed as a single overall DIQ score 



344 


Measunng Intelligence or Mental AWUty 


WectaUr (VflSC-Ri. Hus MUdhgence scale 
David Wechsler ( 1974 revision) has five subtests forming a 
score and five other foimmg a Performance (or nonverbal) score 
rvith both together giving the Full Scale score (Each group also 
includes a supplementary or alternate sixth ^uhtast ) These SUD 
tests are all intended to measure general mental ability 


Verbal siibtests examples 

• infonnalxon (recall of knowledge) Given a question, pro^ 
vide the particular fact that is called for 

• coniprekension (understanding of knowledge) Given a 
particular object or event, explain one or more of its par- 
ticular properties or causes 

• arithmetic (numerical ability) Given a verbal problem in- 
volving numbers, compute a solution 

• similarities (reasoning) Given A and B, determine the re- 
lationship 

• vocabulary (verbal ability) Given a word, tell its meaning 

• digit span (memory, optional) Given three numbers, re 
peat them in order 


Performance subtests examples 

• block design (analysis of a complex whole) Given, a pic- 
ture of a design arrange four blocks with different mark 
mgs to form that design 

• picture completion (analysis of the parts from the whole) 
Given a picture, name the part that is missing 

• picture arrangement (identification of the whole from the 
parts) Arrange three pictures in the sequence that cor- 
rectly tells the story 

• object cs5emW>' (synthesis of the whole from the parts) 
Put the pieces together to make an object 

• coding (digit symbol substitution) Match the numbers 
with their given symbol codes 

• mazes (optional) Fmd the route through the maze 


This scale usually takes an hour to administer Subtests 
ace administered one after the other with the child being given 
subiest Items m graded order of difficulty until he or she can 
1 s»bte5t IS then begun Children from 

Wechsler Preschool and Primary 

Wechsler S? 5'ounger children and the 

Wechsler Adult Intelligence Scale for adults ) Scores on each 



Some Specific Tests 345 


An Jem llluslrative of Those on the Peabody Picture Vocabulary R3ure12 8 
Point to “chair” 


1 

2 



3 

4 


•Reproduced with the special penrnsston of the author, Lloyd Af i>unn 

subtest are reported as are DIQ’s representing the Verbal score, 
Performance score, and Full Scale score It is published by The 
Psychological Corporation 

Peabody Picture Vocabulary Test. The PPVT (Dunn, 1965) 
IS an individually administered intelligence test that uses no words 
as stimuli Each item of the scale is made up of four pictures (see 
Figure 12 8 for a sample) The test administrator reads the name 
of the object or scene m one of the pictures and the child must 
point to the appropriate picture Paradoxically, although the test 
requires no reading, it measures intelligence strictly in terms of 
vocabulaiy — that is a child's verbal ability or whether he or she 
knows what different words mean For this reason, the test must 
be considered one that is highly subject to cultural influences “ 

Since the same cultural influences may alfect sdiool achievement, this 
influence may actUciUy enhance the predictive validity of the test (a plus 
for the test perhaps, but a minus for school) 



346 


Measuring Intelligence or Mental Ability 


The niMu^ suggests a convenient starting point as that at 
which the child c, get eight consecutive items cLr^ct This Ln 
mu es the child's basal level He or she continues through the 

mate 6 e™“ “““S dtSculty, until he or she 

Ae las. co^ect iteL r^^^^^^^ "™ber of 

score The test contams a total" MsoTems 

ages, te“tion fo^a^d ^ 

pictures provide the basK fnr f e ^ Moreover, the same 
m terms of which of the fo" TtZr^' ^ ‘‘“f- 

reported median alternative fom 1 's correct The test has a 

levels Correlations Seen itTpvtSJrh “L"'/ “2= 

have a median of 0 71 between it, Stanford Bmet 

hgence Scale for Children Verbate Wechsler Intel- 

PPVT and adult Wechsler full scale a between the 

suggest reasonable (concurrent) vahduv Th” ° 
for use with nonreaders within ^ ^^he test is recommended 

--PUh,h.yrt.sp„hhshe'^"rA;tc7r^r,r„ceVe“,“'"^ 


Group Tests 
of Mental 
Ability 


ate: teln^rmte"" - ”«-l 

secondary schools We will desenU 5 ^srooms of elementary and 
among them, particularly ‘he more well teown 

basis at one or more grade tecl!?' “‘a 8‘ven on a distnc.-wide 

are more hkely tncoLincrnter teacher: 


Short form Test of * j 

The SFTAA was develooed f* ''P***ude (SFTAA lovni 

Tp“T::re7„^te-^!7te'^^ 

nieasures verbal^c”* following subtests 

3nd the ability ^P*'el>onsion. knowledge (whicl 

^rp^orS' f ‘ 

■" o aenes of ~ et. 

■ or geometric figures), and 



Some Specific Tests 347 


Memory (recalling facts or ideas, making inferences, and recalling 
the logical flow of a story) 

Subtest raw scores are provided along with language and 
nonlanguage^ subtotals and an overall total score m the form of 
deviation IQ’s, percentile ranks, or stanines (see Figures 12 4, 12 5, 
and 12 6) In addition a reference scale score (another standard 
score) IS also available The SFTAA is published by California 
Test Bureau/McGraw Hill and may be coordinated with either 
of their standardized achievement testing programs or used 
independently 

Otis Lennon Mental Ability Test (OLMAT, 1967) The OLMAT 
has SIX levels as follows Primary I (grades K 5-fC 9), Primary II 
(grades 10-15), Elementary I (grades 15-3 9), Elementary II 
(grades 4 0-6 9), Intermediate (grades 7 0-9 9), and Advanced 
(grades 10 0-12 9) Administration time vanes from 30 minutes 
for the primary levels to 50 minutes for the other levels Although 
only a total score is reported (consistent with the definition of 
intelligence as a general capacity described on pages 313-17), it is 
reported six ways as a raw score, as a deviation IQ score per 
centile rank and stanine by age, and as a percentile rank and 
stanine by grade The test is constructed to measure verbal, nu 
merical, and abstract reasoning abilities 

OLMAT IS a widely used test, particularly in conjunction with 
the Metropolitan Achievement Tests (1970) or the Stanford 
Achievement Tests (1973) Correlations between it and a wide 
variety of aptitude, intelligence, and achievement tests (as well as 
school grades) as reported in the Technical Handbook are quite 
high It is published by Harcourt Brace Jovanovich 

The OLMAT evolved from the various Otis intelligence and 
mental ability tests, which were among the first large scale group 
tests used for measuring individual charactenstics Otis originally 
designed these tests more than 50 years ago to cast Binet type 
items into paper and pencil format 


Cognitive Abilities Test (1968-71) This test, published by 
Houghton Mifflin, was developed from the Lorge Thorndike Intel 
Iigence Tests It comes m a Multi level Edition suitable for grades 
3-12 and a Primary Edition for K-3 The Multi level Edition con 
tains three separate batteries (each of which takes 35 minutes to 
complete plus about 15 minutes to administer), which are Verbal 



348 


(vocabulaiy, sentence completion, verbal classification, and verbal 
analogies). Quantitative (number senes, quanlltaliye ''='^1’°'' 
and Equation building), and Nonverbal (figure classification, 
figure analogies and figure synthesis) Eight different ^“1 ° ’ 

lapping levels cover the third to twelfth grade mnf 
may he coordmated with the Iowa Tests of Basic Skills and/or 
Tests of Academic Progress, both achievement test battenes The 
major feature of these tests is that separate scores (DIQ s, Per- 
centile ranks, and Stanmes) are reported for each battery th^ 
providing an mdication of verbal, quantitative, and nonverbal 

The Primary Battery (I and 11) uses pictonal materials and 
oral instructions and includes subtests of oral vocabulary, rela- 
tional concepts muUimental (identifying the one that doesn t 
belong) concepts, and quantitative concepts Administration pro- 
cedures feature item by item pacing to assure that the test is not 
one of speed A shortened version of Primary I is also available 


Cooperative School and College Ability (SCAT Senes 1I« 1966) 
These tests representing a complete revision of SCAT I, are avail 
able at four levels grades 4-6, 7-9, 10-12 and 12-14, with two 
forms (A and B> available at each level Published by Educational 
Testing Service these ' academic aptitude tests ' are "intended 
primarily as a measure of a student’s ability to succeed in future 
academic work (as descnbcd in the Handbook jor SCAT Senes II 
along with a raft of psychometnc information about the tests) 
SCAT II is a good predictor of academic performance as well as of 
scores on the College Boards, Uke other tests of mental ability, 
it has high reliability 

Testmg time on SCAT II is 40 minutes with an additional 10 
minutes for test administratjon There are two subtests, a verbal 
one in which all items involve verbal analogies, and a mathemati- 
cal one in which quantitative comparison items appear Verbal, 
mathematical and total scores are reported as percentile ranks 
and percentile bands (a score range going from one standard error 
of measurement below to one standard error of measurement 
above the actual score) IQ scores are not reported 


Henman Nelson Tests of Mental Ability (1973 Revision) 
These tests are published by Houghton Mifflm and available as 
^ ^ Primary Battery for 

grades K-2 Areas coveted by the 90 items m Form 1 are vocabu 



Some Specific Tests 349 


lary, sentence completion, opposites, general information, verbal 
analogies, verbal classification, verbal inference, number series 
arithmetic reasoning and figure analogies The Primary Battery 
includes a Listening Test, Picture Vocabulary Test, and Size and 
Number Test Each form takes 30 minutes and for each, DIQ’s 
(by age), stanines, and percentile ranks (by age and grade) are 
reported 

Primary Mental Abilities Tests (PMA, Revised 1962) These 
tests, developed originally by L L Thurstone and published 
by Science Research Associates, are based on the definition of 
intelligence as a group of traits originally proposed and operation 
ahzed by Thurstone himself (see pages 317-18) Batteries are 
available for grades K-I, 2^, 4~6, 6-9, and 9-12 and measure 
(1) verbal meaning (ability to work with words), (2) number 
facility (ability to work with numbers), (3) spatial relations (abil 
ity to conceptualize), (4) perceptual speed (ability to distinguish 
size and shapes, grades K-6 only), and (5) reasoning (ability to 
think logically, grades 4-12 only) These represent a variation on 
Thurstone's original six factors listed on page 318 Testing time is 
listed as apprommately IH hours and scores are reported for 
subtests and for the total battery as DIQ's (called "deviation 
ability quotients" for these tests) and percentile ranks Ratio IQ, 
mental age, and stanine scores are also available for some of the 
batteries 

Like many of the other mental abilities tests, this test has 
strong historical roots Unlike most of the others, it reports dis 
Crete factor or subtests scores reflecting its basis in the definition 
of intelligence as a group of traits rather than as a single general 
one 


Culture Fair Intelligence Test (IPAT, 1961-^3) This test was 
developed by R B Cattell and is published by the Institute for 
Personality and Ability Testing, hence the designation IPAT It 
can be administered to both children and adults, and m each case, 
has three scales The scales take from 30 minutes to an hour to 
complete and claim to be relatively independent of school achieve 
ment, social advantages, and environmental influences They 
attempt to measure fluid ability (the general intelligence factor 
as defined on pages 313-17). which presumably manifests itself 
through adaptive mental behavior m situations so unfamiliar that 
previously learned skills can be of no help m guiding such be 



350 Measuring Inteillgetice or Mental AWUty 


havior To accomplish this, the test includes nonsense (meaning 
less) material, universally unfamiliar material, and commonplace 
material The evidence on whether this test is free of cultural bias 
IS mixed Some sample items appear m Figure 12 9 


Figure 12 9 

ANSWERS 


□ 


Sampie Items from the Culture Fair MeiUgence Test * 

Choose from the 5 boxes on the right the one that ought to go into the 
dotted empty box that is the one that would go after the first three 


hbh: . esa^B 


Which of the 5 figures is different from the others? 


□ 


CLASSIFICATIOH 


seHfflS 


Which one ot the 5 boxes on Ihe nght woutd be the one to fill the dotted 
empty box and make the big figure look right? 


□ 


0 ^ i s da 




□ 



1949 19«lbylPAT Clan.pa.gu HI. Repreducad by penmsaon 


Answers 
14 13 




Some Specific Tests 351 


Differential Aptitude Tests (DAT; Fifth Edition, 1973) These 
tests, developed by G K Bennett, H G Seashore, and A G 
Wesman and published by The Psychological Corporation, are 
designed to measure specific aptitudes Their subtests include 
verbal reasoning, numerical ability, abstract reasoning, clencal 
speed and accuracy, mechanical reasoning, space relations, spell 
mg, and language usage However, if one follows the trait factor 
conception of intelligence described by Thurstone (see pages 
317—18), the subtests of the DAT can be thought of as including 
within them the measurement of mental ability In fact, the tests 
provide a measure of general scholastic aptitude (which can be 
interpreted as mental ability) by combining scores on verbal rea 
soning and numerical ability subtests 

Scores on each subtest of the DAT and on verbal and numeri 
cal combined are presented as percentile ranks and percentile 
bands in a profile such as that shown m Figure 12 10 on the next 
page This information can be used to help students decide which 
courses to take in school and the kinds of occupations to explore 
as career possibilities as part of a Career Planning Program 
(Career relevant tests such as interest tests are described m 
Chapter 14 ) 

Two forms, S and T, are available for use with students m 
grades 8-12 The complete battery takes about three hours to 
complete 

The choice of a classroom administered IQ test is not ordi 
narily made by the teacher Often it is part of the total distnct 
wide testing program and is based on the achievement battery 
chosen for use, that is the companion 10 test to the achievement 
is wsed Moveovec, the various 

group intelligence tests is a difficult task The clearest form of 
variability is on the verbal-nonverbal distinction Beyond that, 
most available and commonly used group tests, like those de 
scribed, utilize the same general types of items (except perhaps 
for a nonlanguage test like the IPAT), report the same types of 
scores feature the same strengths in terms of quality of stand 
ardization, have approximately the same degre of predictive valid 
ity (that IS, predict school success, again except perhaps for the 
nonverbal tests), and fall heir to the same shortcomings Perhaps 
the major difference is in an orientation toward intelligence as a 
single general factor or as a group of traits a difference reflected 
in the number of specific subtest scores reported 



352 Measuring Intelligence or Mental Ability 


Figure 12 10 A Sample Profile of a Student s Differential Aptitude ] 


CLARK LARRY 


1972 F S 10 M 


O 2S 30 40 so 60 70 7S eo 90 



“ "• ’ fs,., « c«., „ - "»• n< 

""" .na „ «nt I, 


Reproduced bv 

t.Uo„ NswY^f'S'M^^PWgh. © 1573 







Tests of Creativity 3S3 


TESTS OF CREAT^v^TY 

Although not as widely used as mental ability tests, tests of ere 
ativity measure a diffeient aspect of the mental process Guilford 
(1967) has emphasized the point that creativity requires divergent 
thinking rather than the convergent thinking measured by mental 
ability tests In convergent thinking, one seeks to reduce much 
information to the one correct answer In divergent thinking, one 
attempts to expand from a small amount of information to many 
correct answers Consistent with this definition, tests of creativity 
require students to construct novel but appropriate responses to 
given situations Such multiple and imaginative responses cannot 
be scored by computer because their individual acceptability must 
be judged For this reason and also perhaps because of a lack of 
acceptance of creativity as a legitimate educational goal, tests of 
creativity have not enjoyed widespread use m the schools Two 
tests of creativity will be briefly described below 

The Torrance Tests of Creativity (1966) These tests have 
both a Verbal and Figural section The Verbal section includes 
(1) ask and guess the student is given a picture and asked a to 
guess what might have led to the scene pictured and b to guess 
what might happen next (as m all the Torrance tests, the student 
is asked to produce as many answers as he or she can), (2) prod 
uct improvement a picture of a toy is shown and the student is 
asked to suggest changes that would make the toy more fun to 
play with, (3) unusual uses the student is asked to list as many 
unusual uses as he or she can think of for a commonplace object 
such as a tin can The Figural tests ask the student to make draw 

Figural subtest gives the student something to start with, such 
as a page of circles, and asks him or her to make as many different 
pictures as possible, with the circle a key part of each picture 
Working time for each half is 45 minutes and the test can be used 
at most any elementary grade level (K-8) 

Scoring requires some degree of training Three scores are 
given fluency (total number of acceptable responses), flexibility 
(number of categories in the manual used by the student) and 
originality (number of responses not on the list of frequent re 
sponses) Norms are not provided, each student's responses are 
evaluated against a set of common criteria, hence these tests may 
be considered criterion referenced 



354 


Measuring Intelligence or Mental Ability 


The validity of the Torrance tests as measures of creativity ha 
not been clear^ established The evidence is mixed Yamamoto 5 
(1963) results suggest acceptable validity (le, a 0 50 correlatio 
between Torrance score and evaluation of imaginative stones for 
40 fifth graders) while Wodlke’s (1964) attempted replication hold- 
ing IQ constant suggested otherwise (r-0 24) The work OT crea- 
tivity and intelligence by Getzels and Jackson (1962) and Wallach 
and Kogan (1965) suggests that the concepts of IQ and creativity 
can be measured separately The Torrance tests are published y 
Personnel Press 


Remote Associates Test (1971) In the 4(l-iteTn RAT, a high 
school student is presented with three words (e g , base, snow, 
dance) and asked to supply a fourth somehow related to each of 
the three (e g , ball) The three words are considered to be from 
'remote associative clusters" and the fourth word to provide 
a ‘mediating link" 

This test, authored by S A Mednick and M T Mednick and 
published by Houghton Mifllin, is untimed but takes about 40 
minutes Its norms are somewhat limited but its reliability (split- 
half) IS high However, its major weakness may be m the area 
of validity where no data on hi^ school students are cited (Back- 
man and Tuckman, 1972) IQ and RAT correlations tend to be 
about 40 Worihen and Clark (1971) contend that the RAT is 
primarily a measure of sensitivity to language structure rather 
than a measure of creativity Backman and Tuckman (1972) 
tecommend that this test be restricted pnmaniy to research uses 
rather than used for individual placement decisions until more 
validity data are available 


ISSUES IN INTELLIGENCE TESTING 


There is a great and continuing debate going on m intelligence 
testing that questions its very basis and utility The critics of 
intelligence testing claim that it adds nothing to the educational 
process and may subtract quite a bit Their claim is based on their 
contention that intelligence tests discriminate not between the 
more mentally able and the less mentally able (and in more ex- 
monl hr" -s intended, but between 

Sheme'’";® Ojltural experiences They contend that 

virtue of income status, race, or olher personal charactenst.cs) 



Issues m Intelligence Testing 355 


from sharing m full educational opportunity The questions seem 
to be What do scores on the test mean? Do the tests measure 
intellectual capacity, do they measure relevant experience, or 
both-* And of what educational use are the scores? Since 
It IS important m the study of measurement to consider both "how 
to and also should we,” these questions deserve attention 


Do scores on an IQ test reflect how bright a person is or the kinds The Meaning 

of expenences he or she has had? Jensen (1968, 1969) cites evi of the Scores 

dence to show that identical twins regardless of whether they are 

raised in the same home or in different homes have intelligence 

test scores that are closely related much more closely related than 

are the scores of children who have no biological relation to one 

another but are raised in the same home In the study Jensen 

shows that people low m socio economic status score lower than 

people at middle and high socio-economic levels He also cites 

evidence to show that blacks score lower than whites on IQ tests 

such as Progressive Matrices, that would not seem to relate to 

specific cultural experience 

In truth It must be said that we do not know how much a 
person's genes contribute to his or her scores on an intelligence 
test and how much his or her life expenences contribute We can 
only conclude that an intelligence test score reflects in part a 
person’s born capacity and in part the kind of expenences he or 
she has had and been conditioned by Moreover, such experiential 
effects may be considerably more subtle and more pervasive than 
we have heretofore thought The more meaningful question for 
education may not be what the scores mean but what their educa 
tional value is regardless of what they mean Hoivever, the ten 
dency to define performance m terms of a composite of mental 
abilities may help to overcome some of the mdetermmancies of 
the intelligence concept 

The purpose of education is to enable children to leani and grow The Educational 

Any educational practice that contnbutes to this purpose pre Value of the 

sumably should continue, conversely, any practice that adds Scores 

nothing to or detracts from this purpose should be terminated 

Regardless of what IQ or mental ability tests measure, it is pos 

sible to question the general education value of the scores when 

used on a wide scale Although it is true that the scores are often 

related to subsequent school achievement, it is possible that the 



356 


Measuring InUlUgence or Mental Ability 


scores mnuence the very outcomes they ore tnteuded to predict, 
tests tor which all children hate not been equally well prepared 


Box 12 3 


PUTTING INTELLIGENCE TESTING IN 
MODERN PERSPECTIVE 


We are not so certain of what intelligence is, how to measure it. what 
Its properties are. or what it should be a pierequisite for that we can use it 
as a basis for making major life decisions for children in perhaps other 
than extreme cases Controversy is currently raging over whether intelli- 
gence as it Is presently used Is a universal or culturally-bound concept 
Anne Anastasi a former president of the American Psychological Associ- 
ation and expert m differential psychology, offers the following five points 
on the way that intelligence may best be viewed 

1) First intelligence should be regarded as a descriptive rather 
than an explanatory concept An IQ is an expression of an individual s 
ability at a given point m time, m relation to his age norms NO 
intelligence test can indicate the reasons for his performance 

2) Second the IQ isnotfixedand unchanging, and it is amenable to 
modification by environmental interventions 

3) An individual s intelligence at any one point m time is the end 
product of a vast and complex sequence of interactions between 
hereditary and environmental factors 

4) Intelligence IS not a single, unitary ability but a composite of 
several functions The term Is commonly used to cover that combination 
of abilities required for survival and advancement within a particular 
culture and finally 

5) An individual s relauve ability will tend to increase with age in 
those functions whose value is emphasized by his culture or subculture, 
and his relative ability will tend to decrease m those functions whose 
value Is deemphaslzed 


A Anastasi Psyc/joJoff/ca; Jesting 3rd ed Reproduced by permission of 
Macrniilan Publishing Co ® 1968 p 211 All rights reserved No part of 
this book may be reproduced or utilized in any form or by any means. 
Kl, u 0^ "lechanical. mcludmg photocopying recording or by any 

i™The plbSr '“'■""3 



Additional Information Sources 357 


in which case, the scores can be used to provide supplementary 
instruction in the areas measured by the tests to students whose 
performance is lower than that of their age-mates. There is reason 
to suspect that such instruction, if done weli, will cause such 
students to perform better on intelligence tests. If this is to be 
done simply for the sake of scoring high on a test, it should not be 
done at all. If, however, it is to be done in order to improve a 
child's chances of subsequent academic success, it should be done 
systematically and with vigor. The far wiser course may be to 
focus on achievement testing as the more profitable and meaning- 
ful form of educational testing for the majority of school children 
and thereby focus intelligence or mental ability testing on the 
identification of exceptionafity for purposes of enhancement or 
remediation. 


Additional Information Sources 

Barron, F. The measurement of creativity. In D. K. Whitla (Ed.). 
Handbook of measurement and assessment in behavioral sciences. 
Reading, Mass.: Addison-Wesley, pp. 348*66. 

Bloom, B. S. Stability and change in human characteristics. N.Y.; 
Wiley, 1964. 

Freeman, F. S. Theory and practice of psychological testing, 3rd ed. 

N.Y.: Holt, Rinehart, and Winston, 1962. 

Hawes, G. R. Educational testing for the millions: What tests really 
mean for your child. N.Y.: McGraw-Hill, 1964. 

Hoepfner, R. & Klein, S. CSE elementary school test evaluation. Los 
Angeles: Center for the Study of Evaluation, UCLA Graduate 
School of Education, 1970. 

McClelland, D. Testing for competence rather than for "intelligence.’' 

American Psychologist, 1973, 28, 1-14. 

McNemar, Q. Lost: Our intelligence. Why? American Psychologist, 
1964, 19. pp. 871-82. 

Nelson, M. J. Intelligence and special aptitude tests. In R. L. Ebel 
(Ed.). Encyclopedia of educational research, 4th ed. N.Y.; Mac- 
millan, 1969, pp. (tbl-n. 

Tarezan, C. An educator's guide to psychological tests: Descriptions 
and classroom implications. Springfield, HI.: Chas. C Thomas, 
1972. 

Tuddenham, R. D. Intelligence. In R. L. Ebel (Ed.). Encyclopedia of 
educational research. 4th ed. N.Y.: Macmillan, 1969, pp. 654-64. 



358 Measuring Intelligence or Mental Ability 


Self lest of Proficiency 

(I) For each ot the definitions or descriptions of intelligence on the left, 
match up the appropriate illustration on the right 


a general intellectual , improvement as the result of in- 

„ struction 

0 aSah.ri!"^ '' •I’® 

=Lmeasures ^ 

9 schTdpertolmance ' “ "'"f® that 

h two level process 

vl ability to deal effectively with 
the environment 

the score on the measuring in- 
strument Itself 

vm the combination of an operation 
on a content that yields a 
product 

lx composed of a number of dis- 
crete mental abilities 

FALSE 

(3) Which ol the following should noi h. 

preparing a composite definition ot "hen 

3 *=*al comprehension '"‘'"'athco? 

t> hgural comprehension 

c drawing skill 

inductive reasoning 
® spatial relations 
f musical ability 

5 two hand coordination 

" numerical skill 
' '"“-Poml threshold 

1 Visual acuity 

‘F and 

"“'Sned to reinforce or 



Self test of Proficiency 359 

require these ski/is To whal does the italicized portion of this 
definition refer and what does it mean? 

(5) Consider the following Item 

Which word does not belong? chair table pencil bed 
This Item Is Illustrative of the item category 
a word substitution 
b synonyms 
c word classification 
o' verbal analysis 
e recognition of objects 

(6) Consider the following item 

Which alternaiive at the right Is most nearly hke the Illustration 
shown at the left? 



This Item 18 illustrative of the item category 
a figure matching 
b spatial visualization 
c recognition of objects 
d identical figures 
e figural relations 

(7) a Ted s performance on an intelligence test was the same as that 

of the average 7 year old child Seven Is Ted e 

(chronological mental) age 

b Ted has just celebrated his 7ih birthday Seven is Teds 

(chronological mental) age 

c Based on the above information we can say that Ted s 
(ratio deviation) IQ is tOO 

(8) Miranda who has just turned 11 took the Short form Test of Aca 
demic Aptitude and obtained a total raw score of 47 

a What IS Mirandas deviation IQ'? 

b What IS Miranda s mental age (m years and months)'^ (Use 
Figures 12 5 and 12 6 on pages 336 and 337 ) 






Measuring Intelligence or Mental Ability 


e true, thereby contributing 


(9) Rel,ab,l,ty coemcants (baaed on Inlemal cons.alenoy) (or the ntalor 
mtelligence or mental ability tests tend to average about 

b 83 ° « 

® d 6B 

™Ter •''= 'Q --es 

a IQ scores are unpredictable at ail ages 
Third grade 10 scores cannot be used to predict scores at grade 

d FcuTgl?, "" IQ 

e None of the above scores at grade 11 . 

' irsrr --- - 

tests ^'“‘'Fhls who score lower on IQ 

' ~:arre„^^ro°s:o? r" 

9 Persons in the prSons ,!^h ? 
persons in the trades ^ ° “h 1° 'ests than 

e w“::~°„dfc::“r 

'earning are negatively correlated^^' 'e"ecling amount ol school 

oI'ml'elNger=e let" "p '“"“wing list of mental ability 

' ^'^",'’P'»'P'e>“'e Vocabulary 
It SCAT 

Test 

e Stanloid-Binei 

P WhichSareln’r'’ “"""'e'e'ed? 

P Which ones use 

' -- -!:irp.rr "" 

'ehve scoies, " 'PPe'e'e veibal, „c„ve,ba,, and quanll- 


y> SFTAA 

''>1 IPAT Culture Fair Test 
yii Wise 

lx Olis-Lennon Mental 
Ability Test 
* Heninon Nelson 



Self test of Proficiency 361 


(13) Which one statement below correctly distinguishes between the 
Stanford-Binet and Wechsler intelligence scales? 

a S-B has been recently revised, WISC has not 
b S-B can be used with children, WISC only with adults 
c S-B comes in a single form, WISC comes m 2 forms 
d S-B IS individually administered, WISC is group administered 
e S-B provides only a single score, WISC provides 3 scores 

(14) The Torrance Tests of Creativity require students to 

(15) The Remote Associates Test requires students to 

(16) The scores a person gets on an inteltigence test is a function of his 
or her 

a genetic make-up 
b environmental experiences 
c both of the above 
d neither of the above 
e nobody knows 

(17) Among the following, which is the best educational use to which IQ 
scores can be put? 

a to help rid the schools of children (hat get into trouble 
b to identify children who need enhancement or remediation 
c to separate children into ability groups or tracks 
d to help teachers to know what school performance to expect from 
each child 

e to help children become aware of (heir own deficiency 

(18) Among the following, which is the least desirable way to increase a 
person s intelligence test score’ 

a increase the value of intelligence in his or her culture 
b provide training in the areas covered by the lest 
c relate the test more closely to his or her culture and vice versa 
d use tests that place less reliance on verbal ability 
g make the person and his or her teachers aware of the low score 



chapter thirteen /Measuring 
Achievement with Published 
Test Batteries 


OBJECTIVES 


batienes (le penerai survey achievemeni 

arts, reading,' mathema^Lcr'^s™“l' 

encc and Study skills) ' studies, sci- 

“rdized‘^rtSh“'bulu"™“ stand- 
approprialeness, validitv °n 

bilily, and usability “'’■'■•y. mterpreta- 

' Distinguish between net 
ability tests and and their T™"' '"“•a' 

I “"“’a ■“"•iPPPiP of ability teshng°f 'Paaificity 

achietOTcnt torscoles° standardized 

» raw score, b percen 



tile rank, c stamne, d grade equivalent, e stand 
ard achievement score, f anticipated achievement 
score, use item performance to inteipret results 

6 Distinguish between criterion referenced tests 
and norm referenced tests 

7 Identify the characteristics of tests for the mea 
surement of reading skills 


WHAT DO STANDARDIZED ACHIEVEMENT TESTS MEASURE’ 

Standardized achievement tests enjoy a widespread use m the 
public schools of this countiy Many school systems administer 
these tests once a year to all students or at least to all elementary 
school students The first question that must be dealt with m 
understanding standardized achievement tests and their role m the 
schools IS what they measure ' 


Reading skill is generally measured m terms of vocabulary word General 
analysis, and reading comprehension Achievement 

in Reading 

Vocabulary In vocabulary tests, respondents are given a 
definition or synonym and asked to identify the word that ts a 
synonym or that fits or completes the definition * 

• If you tear a piece of paper, you 
rip it ciy about it stain it 
o o o 

^ The descnption of achievement tests in this chapter is based largely on an 
analysis of the major achievement battenes Stanford Achievement Test 
(1973 edition) California Achievement Tests (1970 edition) Comprehensne 
Tests of Basic Skills (1973 edition) the Iowa Tests of Basic Skills (1972 
edition) Tests of Academic Progress (1964 edition) and Metropolitan 
Achievement Tests (1970 edition) 

* All examples are patterned after actual achievement test items Unless 
otherwise specified the first sample item in a pair is an early elementary 
Item (i e grades I 3) and the second item a mitfdfe sc/ioof or junior fugft 
item (i e grades 7-9) At the lower levels lest items and answer choices 
are read aloud by the teacher while students have answer choices before 
them at the same time Answers to sample items appear at the bottom of 
page 372 


363 



364 


Measuring Achleveiuent with Published Test Batteries 


• You could best desenbe an extrovert as being 
a pensive 
b exuberant 
c outgoing 
d self conscious 

Word Analysis. This type of lest at early grade levels re- 
quires a student to correctly identify a word that he or she hears 
more specifically, to dtstwguish a word given orally from poten- 
tially confusable jorms Items often require both listening and 
reading ability, together with the ability to distinguish from 
among the confusable forms For example 

• SAID ( read aloud by teacher) 

seed sod sad said (seen by student) 

0 o o o 

At the later elementary grades, this lest may also deal with 
the similanties between sounds m words, asking the respondent 
to select a word that has the same sound as the underlined part of 
a given word For example 

• rhyme happy nch pile 

® ® © 

For older students word analysis items also often require slu 
dents to select the sound that is the same as a designated part of 
a given word 

• The underlined sound in particular (per lik* yoo-1^) 

17 rhymes with the a m fan 

18 IS pronounced like the a m play 

19 has the e in penny sound 

20 has the u sound in curtail 

As you can see from the various illustrations word analysis 
focuses on the sounds and structure of words 

lading Compreheoslon Reading is sometimes measured m 
part by determining whether the student can identify the word 
that His a given picture A first grade item might look like this 

Q bat 
o mat 
o but 




Standardized Achievement Tests 365 


At higher grade levels, a story is provided, which the student 
reads He or she must then identify the correct answer to a qiies 
tion b^cd on the content of the given story A fifth grade item 
might be as follows 

• The sea otter who lues m the Pacific Ocean is covered with 
a coat of beautiful and valuable fur Between 1700 and 
1910, hunters and trappers tried to kill the otters They 
would then skin them and sell their fur In order to pro 
tect the otters from being wiped out, a law was passed m 
1910 that said that people couldn t kill otters 

People killed sea otters 

1 because it was fun to do 

2 to get their fur 

3 to keep them from eating up all the fish 

4 to keep them from spreading 

From the story wc can tell that 

5 all the settlers were so greedy for fur that they would do 
anything to get it 

6 you can't buy an otter furcoat today 

7 by 1910 there was a danger that no otters would be left 

8 between 1700 and 1910, more otters were caught than any 
other furry animal 


A considerable number of the items in achievement battenes are 
devoted to the measurement of language skills— for example 
spelling, mechanics (or grammar) and usage 

Spelling. In the multiple choice, machine scorabJe format of 
published test battenes, respondents cannot be tested for spelling 
by having them actually construct the spelling of a word Hence a 
form or forms of the word must be given (often in a sentence that 
the person administering the test reads aloud), and then the re 
spondent is asked to choose the correct spelling of a given word 
or indicate whether a given spelling is correct or incorrect For 
example 

• Don't you believe me’ believe (read aloud by teacher) 
believe RIGHT WRONG (seen by student) 


General 
Achievement m 
Language 



366 


Measuring Achievement vrfth Published Test Batteries 


“ “ S*™ ““ °"<= 'correctly spelled word 

1" --“pondent to distinguish the incorrectly spelled word 

/romtlie correct ones (as illustrated below) 

• Which xvord IS spelled incorrectly’ 

5 temperature 7 athletic 

6 sophmore g 

PuncmatripimS™ Zt"? - 

the respondent is eiven an i’ sentence structure For example, 
and asked to tdennfv th& uncapitalized sentence 

te.ionforeac,f;ZiflL7Z^^^^^^ 


' “■‘‘yPP fed the book I left at your house 


rs bT' T r"“ 

The boys all laughed at Jimmy 

“"dmcompteteslnlmc^ *° distinguish between complete 

’ !!f ""ich he tned 

complete sentence mo i 

A '^complete sentence 

B 

Usage Languaee iica<ro 

tt^Z ^^'^rtorpSLTZ 

proper end •mprtpetZZa'f wo^’’’ “ ‘‘•^’’"^ishmgZt'ween 

1 % frra'd she Srt “ '““■'“■■d sentence’ 

• Yesterday we 

2 f- o- club 

3 chose 

Will choose 



Standardized AcWe\ement Tests 367 



' Look. Tom, look It Is good Yes Good The end Published by 
Kaleidoscope Enterprises, 432 Schermerhom Boulevard' 


fn addition to language skdis and reading, mathematics is a com General 
mon component of standardized achievement baftenes Most bat Achievement in 
tenes subdivide mathematics into two or three components with a Aiathematics 
variety of labeJs For our purposes we will use the terms computa- 
tion, concepts, and applications to describe achievement in mathe 
matics 

Mathematics Computation. This type of test deals with the 
respondent's ability to add, subtract, multiply, divide, determine 
equals and unequals, take square roots, and perform basic mathe 
matical or arithmetic operations in general For example 

• 12 - 4 = 16 9 8 7 

o o o o 



368 Measuring Achievement with Published Test Batteries 


General 
Achievement In 
Studies 


• Thelowestcommondenommatorof 14 and >4 is 

6 12 13 24 

A B C D 

Concepts. Math concepts are more difficult to 
define than computation because they cover a wide area of mathe 

ent to demonstrate his or her understanding of the basic lules 

ZsZtfZT ItT'' ‘^-4caL,s tiZ u uohe 

•>'= ^ntputatton level 

surentenOForexTmpk ° “ 

• What number ,s three hundred seventy two’ 

327 372 30027 30072 


'' o 

• What IS the reciprocal of the integer y’ 

o y y-i 


must demonstrate ^thlTuZders^Z'dZg respondents 

== *’ “54 e; ;=ss 

for a dime 5ul rhEffaUjai' “,"'f ^ P'™5 oF sum 

of gum> 12 jelly beans and 12 pieces 

'o‘ ^ 70e 

rate, how far coSd RosaTnve°in''a ™/r 

° H 

Social studies, or social „ 

« specific skills such as 



standardized Achievement Tests S69 


map and graph reading Respondents are called upon to identify 
the correct piece of factual information regarding history, geogra 
phy, or culture as illustrated m the following item for third 
graders 

• The largest of the 50 United Slates in terms of area is 

(A) Texas @ California 

(S) New Vorfc @ Alaska 

Seventh graders might be asked 

• Which of the following people was not an inventor’ 

5 Thomas Edison 7 Alexander Bell 

6 John D Rockefeller 8 Robert Fulton 

In another kind of item, respondents are called upon to 
identify the choice that represents the correct interpretation of a 
social science problem or situation Fifth graders, for example 
might be asked 

• Which of the following could be considered a political 
slogan’ 




370 


Measuring Achievement with PubJIshed Test Batteries 


General 
Achievement in 
Science 


f 'he apphcanan of 

* fo'' ^ factory Which one 

nahese^lhree .aps shows a place where fac7o ^^duc": 



(0 identify the correct respondents to 

Ae correct explanation or dpptauT '’^‘"°”eimi. (2) identify 

lZf%’‘'°’’‘‘"'°’"^‘^riptZioTal^ “ "> 'he soli 

''rate the application of %ecifil ruf (3) demon- 

r mlT 

® fro™ *= suns 

V§) Uranus (n^ 

• Twuballs.dandB a! a"™ 

St the same time Both ''°“f °f u budding 

reach' ‘hree p ct^ '"'“‘*= °f ^“bd steel Which 

“uhmg the grounds “ sucurately shows them 





- — the black 
page was madt 





Standardized Achievement Tests 371 



The height common to the greatest number of class mem- 
bers IS 

A3t4-4feet B4-4'/Sfeet C4'/i-5feet D5-5tffeet 


Some achievement tests provide separate subtests to measure what 
IS called study skills Others include this area in sublests like sci 
ence or social science, some may not include it at all Study skills 
refers to a student’s ability to idenlijy the correct m/ormation 
from printed and graphic materials Presumably if a sludent has 
learned how to use a library and reference sources contained m it, 
such as a dictionary, encyclopedia, atlas, and so forth, he or she 
will be able to correctly identify information taken from these 
sources The last examples for both social studies and science 
Illustrate measurement of study skills within a subject area test 
Some additional examples of study skills items from the fifth 
grade level appear below (The first two illustrate library skills, 
the third is a map reading item ) 

• Which of the following books would you look m to find a 
historv of commercial fishing’ 

atlas^ encyclopedia dictionary almanac 

A B CD 

. immediate (I me’di It), nd; 1 occtirrmg or accomplished 

without delay, instant an immediate reply 2 pertaining 


General 
Achievement in 
Study Skills 




372 Measuring Achievement with Published Test Ballcrics 


General 
Achievement In 
Listening 


to the present lime or moment our immediate plans 3 
having no time intervening the immediate lulurc 4 having 
no object or space intervening nearest or next in the 
immediate vicimty 

' Which of the above definitions fits the use of the word 
immedinteinthissenlcnce’ 

Becanse ! am in a hnrry I must have an immediate answer 
12 3 4 

‘ y^tto 'et'frorth"''’"'" ■" 

>uu go lo get irom the airport to the lake’ 



1 northeast 

2 southeast 



3 northwest 

4 southwest 


choices may be given m pi7to^lil ‘he answer 

be given orally as wellX *ey may only 

■llustrales one of these formats grade level) 

together They arc wortnTT'^ 

This project is for their teafher together now 

Answers' — — 

,e, that 

arnoe students do proiecls for 



Standardized Achievement Tests 373 


Which statement is based on the sloiy’ (The statements 
are read by the student ) 

A Blair is doing most of the work on the project 
B Jackie keeps bothering Blair when she is trying to 
work 

C Some second grade students do projects for their 
teachers 

D The library is a good place to work on a project 


It is useful to examine the procedures that a standardized achieve 
ment test battery goes through to earn the label standardized ' 
In the next section, using the five test criteria described m Part III, 
we Will compare standardized achievement tests with teacher 
built tests 

Tryout, Item Analysis, and Revision. The first two steps in 
the construction of a standardized achievement test are the same 
as those in the construction of a teacher built test, that is the 
developing of a content outline and the writing of items In the 
case of the standardized test, more than one person may write the 
Items Here the selection of content or objectives is based on an 
examination of existing curnculums and textbooks used through 
out the nation 

As part of its development, the standardized test is tried out 
on a sample of students, not to measure their achievement but to 
determine the properties of the items themselves The results of 
the tryout are then used to eliminate items that are too easy, too 
hard, ambiguous, poorly worded, or inconsistent with the majority 
of like Items Some decisions are based on statistical results while 
others are based on comments by teachers and students After the 
Item writers see the results of the tryout and subsequent analysis 
they can tell which items are poor The poor items are eliminated 
since more items are used in the tiyout then would be needed m 
the final instrument 

For example the following item was contained in the original 
version of a social studies test 

• Ponce de Leon was 

a a governor of precolonial Florida 
b the discoverer of the fountain of youth 
c a conqueror of Mexico 
d a famous pirate 


Why These Tests 
Are Called 
Standardized 



372 Measuring Achievement with Published Test Batteries 


General 
Achievement In 
Listening 


to the present time or moment our immediate plans 3 
having no lime intervening the immediate juture 4 having 

ZmeD “‘'"“'"B- nearest or next m the 

immediate vicinity 

• Which of the above definitions fits the use of the word 
iimiiedw/em this sentence’ 

Because I am in a hurry, I must have an immediate answer 
12 3 4 

’ X'o'to ge/frorfb''’""'" 

you go to get from the airport to the lake? 



1 northeast 

2 southeast 


3 northwest 
A southwest 


Some current achievement tesi 

of listening comprehension— the ah^H nteasureraent 

^ Wd out ‘he correct 

While the question or story '"'“onted orally 

choices may be given m piMoS ‘he answer 

bo given orally as wellX tn™ or they may onty 

■I'nstrates one of these forma, 3 "““'’'= (sm,h g4de level) 

together They are workmf nf'on work 

his project is for them teafher “ '“B^lhor now 

Answers 

' '! by ’“'I © Prle 2a ,3 

? '"'A"" '=>=“ m*' '>«“ “wild bTl'f '"=* " "* •I’o"' 

students do proiecls for 



Standardiicd Achievement Tests 373 


Which statement is based on the story? (The statements 
arc read by the student ) 

A Blair is doing most of the work on the project 
B Jackie keeps bothering Biair when she is trying to 
work 

C Some second grade students do projects for their 
teachers 

D The Jibraiy is a good place to work on a project 


It is useful to examine the procedures that a standardized achieve- 
ment test battery goes through to earn the label 'standardized 
In the next section, using the five lest criteria described in Part III, 
we will compare standardized achievement tests with teacher 
built tests 

Tryout, Item Analysis, and Revision The first two steps m 
the construction of a standardized achievement test are the same 
as those m the construction of a teacher built test, that is, the 
developing of a content outline and the writing of ittms In the 
case of the standardized test, more than one person may write the 
Items Here the selection of content or objectives is based on an 
examination of existing curriculums and textbooks used through 
out the nation 

As part of Us development, the standardized test is tried out 
on a sample of students, not to measure their achievement but to 
determine the properties of the items themselves The results of 
the tryout are then used to eliminate items that are too easy too 
hard, ambiguous, poorly worded or inconsistent with the majority 
of like Items Some decisions are based on statistical results while 
others are based on comments by teachers and students After the 
item writers see the results of the tryout and subsequent analysis, 
they Can tell which items are poor The poor items are eliminated 
since more items are used in the tryout then would be needed in 
the final instrument 

For example the following item was contained in the original 
version of a social studies test 

• Ponce de Leon was 

a a governor of precolonial Florida 
b the discoverer of the fountain of youth 
c a conqueror of Mexico 
d a famous pirate 


Why These Tests 
Are Called 
Standardized 



372 


General 

Achievement In 
Listening 


Measuring Achievement will, PubUnhcj Test Batteries 


having no time inlerventM T/" our mimedmte plans 3 

no object or space inte™ ‘ "^mediate future 4 having 
mmedmie vicinity <>'■ next tn the 

■ 

™ am m a hur^ J must have an i„i„icd.ate answer 
ynn go to ge, "t"t'he S 


^ northeast 
2 southeast 



^ northwest 
4 southwest 


-lude the measureme 
thVqu^C"'”" ‘7 the eorre 

Th'sprojea?'/''' ''“*"^0™! They often w, 
telpher “>S='her r 


t b' ”'4 ©pie 20 

a tp pet 

'"eydope*®, 6 John D rS** 33; '"t her 3 chosi' V',*' “'’Pb™ 

•belrteechers "-west c 



Standardized Achievement Tests 373 


Which statement is based on the story’ (The statements 
are read by the student ) 

A. Blair is doing most of the work on the project 

B Jackie keeps bothering Blair when she is trying to 
^vofk 

C Some second grade students do projects for thejr 
teachers 

D. The library is a good p/ace to work on a project 

It IS useful to examine the procedures that a standardized achieve 
ment test battery goes through to earn the labeJ “standardized '' 
In the next section, using the five test criteria described in Part III, 
we Will compare standardized achievement tests with teacher 
built tests 

Tiyout, Item Analysis, and Revision. The first two steps in 
the construction of a standardized achievement test are the same 
as those m the construction of a teacher built test, that is, the 
developing of a content outline and the writing of items In the 
case of the standardized test, more than one person may write the 
Items Here the selection of content or objectives is based on an 
examination of existing curnculums and textbooks used through 
out the nation 

As part of its development, the standardized test is tried out 
on a sample of students, not to measure their achievement but to 
determine the properties of the items themselves The results of 
the tryout are then used to eliminate items that are too easy, too 
hard, ambiguous, poorly worded, or inconsistent with the majority 
of like items Some decisions are based on statistical results while 
others are based on comments by teachers and students After the 
Item writers see the results of the tryout and subsequent analysis, 
they can tell which items are poor The poor items are eliminated 
smee more items are used m the tryout then would be needed m 
the final instrument 

For example, the following item was contained in the original 
version of a social studies test 

• Ponce de Leon was 

Cl a governor of precolonial Florida 
b the discoverer of the fountain of youth 
c a conqueror of Mexico 
d a famous pirate 


Why These Tests 
Are Called 
Standardized 



Mtasunog AcUei ement vrilh Published Test Batteries 


^o“ron Ihe test^'!, ‘h'l ‘T" “‘=” 

choice b retealed th chance level A close examination of 

Ponce ■Milvertantly contained a tnck While 

of ycudi Und heX ir ‘he fountain 

neier succeeded ui discoverme It alrfh^ associated with it), he 
discoverer He was however ^ hence cannot be called its 

fact that most people do nni PCecolonial Florida— a 

better if choice a had been *M ^^r 'could have been 

wrong) and h ' sefc „TL f (which is 

3.-Ttteret;rcon^ 

-P, and graded m 

are being Sebped1a“fXlan?ard°‘' 

lalration is also being develo~d ,^ f'”' 'heir admin- 
other procedural reqnimmenTrlm ‘he‘ ‘'"le limits and 

thi« administrative pmcedS^es asrd?fff"V"?‘ use to use, 
“aiI o7:h'“‘.^"'^““''-=^™"abr"^"'“‘ Of 

b"°l, ha ‘cacher builds, that is'^ihe m!?/oliieiement test from the 
si7is tests are used onbou ’’ °f °f‘“ ‘aacher- 

Ilme s/h agiIIX“S foe and, on those rare occa 

for multin? ’ ^^^tlardized achievem^ foacher or perhaps the 
achir'S?;,"’ -hoolstS -tatmcted 

'alue to ,!s be used bm oir f an 

prated use E™7,h™"" atandardirato ’ hT f h= >‘ttle 
Et'es these processes value ' ““Ittple and re 

The Existence nf v« 

™7Sarfeed ttPnc 

form of norms' Wh' availability of com 'hem the label 
they uTuallv • '"“"hersadmfnisLr.h'"'"’'™ hata in the 
to Cialuate the sifn”"* ‘h^ results after^if °L™ achievement 
aclfeiemen.iestff"'^ took .hf f fave osed them 
bans for the imlllrl, “"“t'a-l and nsoH , ’ standardized 

^ on pages 279-93 



Standardized and Teacher built Achievement Tests 


375 


ence the performance of students in terms of members of a speci 
nc nonning group who have taken Jt before them It is also 
possible to compare their performance to all those m their own 
district who have taken the test in the past or who have taken it 
at the same time Widespread test use contributes to the availa 
faulty of concurrent data both nationally and locally (The inter 
pretation of norm referenced standardized achievement test scores 
will be discussed on pages 385-91 ) 

After a standardized achievement test is constructed and 
edited its developers administer it to a national sample of stu 
dents who will serve as the norming group This presumably 
representative group will serve as the basis for the establishment 
of national norms on the test clearly marking the test as a 
standardized one The same process will also be gone through 
for each test revision 


COAIPARING STANDARDIZED AND TEACHER BUILT 
ACHIEVEMENT TESTS 

School achievement can be measured by either a teacher built test 
ora standardized test Teachers use their own tests to measure the 
achievement of their own students on their own objectives School 
districts typically use standardized tests to get a broader view of 
student attainment throughout the district While teacher bmit 
tests may be given as often as daily or weekly a standardized 
achievement test battery will probably only be given once or at 
most twice a year It should be useful to compare the two types 
of tests on the five test cntena of appropriateness validity reh 
ability mterpretability and usability 


Appropriateness as you recall refers to the fit of a test s items to Appropnateness 
a set of given objectives The better the fit between what a test 
measures and what a teacher wants it to measure the greater the 
appropnateness of that test 

We can assume that teacher built achievement tests have a 
high degree of appropriateness since teachers build them them 
selves Standardized achievement tests on the other hand are more 
general since they are intended for wider use their coverage must 
be broad enough to cover the objectives of many teachers The 
appropriateness of standardized achievement tests depends on 
what extent a field of study has uniform and commonly agreed 



Measuring Achievement with Published Test Batteries 


upon objectives In a field like high school industrial arts where 
local cumculums may vary widely standardized achievement tests 
will be low m appropriateness (Tuckman and Corman. 1971) In 
first grade reading on the other hand, where objectives from 
appropriateness will be higher 

nal "■'T “'™ ‘“'a, they know that the mate 

a covered in class However 

Objectives It measures are taught In other words, these obicctnes 
may be m the curriculum but not vet tauoht TuV. t, “PJCCtnes 
most stanH!irH„,,i . i . ^ 'auglit This happens because 

Se teTcher S^ ‘hc" <>■“= year of work 

StaXd appropriate only ,n a general sense 

tent in standard te«boo'ksin'the^*K^'^^ based largely on the con 
Where teachers follow the enm area for each grade level 

frequently the case especially textbook— as is 

ized achievement tests*^ill be hiaL'r™^"'”’^ level— standard 
manuals for each of the Zior !. i “PP'-PPnateness Test 
report item content categones bv achievement batteries 

This tells you the topics measured each item on each test 

bers of the specific items that me subtest and the num 

such a table of item Zen, cmeaoT''V“t '°P'= P^Se °f 
men. Tests has been repredZdZn ‘^“‘■furnia Achieve 

nature of achievement testeZZ ) ‘h= 

asedZZrZerZZrrltZ - 

sZ i““’‘™‘“‘'8°r'es for each suht .' Z^^ ““ a list 
T "“'Z arfmt ?b ^Sn-n-ng of each 

Ze \ “‘■‘= "‘= ‘e« 'S given Z “over 

‘bat have been coveZm H ““>^= areas or 

hers had not been taught re,„ For instance if modi 

Z ran breapeSt'V ° 36, and 38 on 

appropriateness of the .nslmmem for" ‘'educing the 

(IPIl in" rurnculum such as Individmn^"D^"“‘‘^ ‘b'® objective 
measure mimculum teache ^^^scribed Instruction 

teacherrcoui?? ' used’!^‘ ™ath test 

are referenced ?„ i ™" “ ‘b' basis forZZ ““bravement tests 

'd rn tenns of ,he speeijl ‘ Z‘=b“‘on these tests 
peei/ie object,^,, measure 



Standardized and Teacher built Achievement Tests 377 


Language Hem Content Category by Level for the California Achieve‘ Rgure 13.1 
ment Tests* 



Cjtrgor t% 

LEVEL \ 

LEVEL 2 

LEVEL 3 


1 e >e. 

Sund<rd Enjlith* 






C«>« 






Nom nat ve 

6 13 16 

9 17 19 

3 12 23 

1 26 


Obieci v< 

10 IS 

1 3 

7 

3 14 18 

6 3 14 17 







I’asiets ve 

2 

6 13 

10 



Tensa 






Sunpla 

3 8 13 19 

3 12 22 

8 19 22 

6 24 


W Ih Aux iMry 

1 17 

14 IS 2S 

13 IS 1G 18 

12 21 





20 21 



Nurnber 

4 7 20 

S 7 10 21 

2 6 

4 13 16 t2 

1 2 9 19 2 

lU 




25 27 

28 

UuQe 

5 9 11 14 IS 

4 8 11 IS 16 

1 4 5 8 9 10 

2 5 7 8 9 10 




20 23 24 

11 14 17 24 

11 16 17 19 

22 26 2o 2« 

U 




20 23 28 


3 Sanlance Siructurt 






^ Cempia*# 



26 27 



M S mpla 




37 

3 37 

Compiax 




38 39 

40 

^ Compound 




40 

38 

Ineomplata 



25 28 29 

35 36 

oC 39 

fj Santanea Pant & f unctiont 






^ Nount Pronount 









39 

42 48 50 

42 43 45 gO 




37 

43 47 49 

41 47 49 51 






52 S3 




35 35 38 

41 45 46 

44 46 48 




40 41 

44 

54 

Inflect one) Morpheme* 



3? 

43 SO 

SO 52 











32 












33 








Oouble Bate 






Silent Letter 

3 4 7 11 IS 

17 IB 19 23 

18 20 23 

15 16 17 19 

26 28 30 31 




24 28 29 

21 28 30 32 

32 

Double Consonant 

1 6 15 

1 3 13 19 

14 15 24 28 

3 6 14 16 28 

3 7 19 27 



to 20 24 

S 7 16 22 27 

5 7 10 15 20 

1 12 14 16 3 

Z 







7 10 11 13 

4 6 10 II 12 

1 10 13 18 

9 11 12 18 

9 to 13 tS 



13 17 21 22 

26 29 32 





23 


31 


n 

VowH plus r 

8 12 17 

16 25 

3 31 

29 

11 

Letter Reversal St ewe 



.n 


14 17 29 30 




. 



Two Wo ds 






M sp onunciat ons 

No Error 

2 5 9 14 

2 5 9 14 

2 4 6 

2 4 8 

2 4 6 8 20 


* From Test Coordinators Handbook. Tiegs & Clarlr bj^ei^s 

Sion of the publisher, CTB/McGraw Hill, Monterey. CA 93940 Copynght © 
1970 by McGraw HiU, Inc 



378 Measuring Achievement with Published Test Batteries 


Validity 


Reliability 


In summaiy, then, it would seem that standardized achieve- 
ment test batteries are appropnate for the general purposes for 
which they are used but should not be considered measures of the 
attainment of each teacher’s own objectives (We will return to 
this issue of specificity-generality of coverage m the discussion of 
achievement and intelligence tests on page 382 ) 


The concept of validity has been applied to achievement testing 
primarily in terns of college entrance testing and subseouen! 
mssed 'Hhi? cfe‘^ achfevement tests dis- 

learning expenences It is d'lffic'uli^?”'^' P“t 

built lists, ^h:;mU' 4 teVbat'f 

EbeuT96nk^ ’““T ■>' ^PProXen^s'”'^"' 

go beyond purely rSobj'KmKMdaHh 

by experts tend to be strong r content is reviewed 

judgment in deciding on\road*cn The use of expert 
standardized achievement tesK ? ?v.“ foverage does provide 
general learning ^ ^ validity as measures of 


ment test battenes Beiuse ulm""®!*" standardized aehieve- 
that Item analysis are steps m the ? ^is and revision based on 
pussible to eliminate Ir * ml 
rgh^egreeofmtemalrehabiliiy rrsistent items to achieve a 

“rent levels of the 
as th 3' abilities reported for other a'b (Tiegs and Clark, 
™ «■= m the sam '’“•‘OTos such 

™S;ther 

'^prot a 'bat high 

■cs. cniena 'ban to improve its statiltlhrother 



Siandardltcd and Teacber-built Achievement Tests 379 


ReliabiliUes (Kuder-Richardson Formula 20) for the 1970 Bdilion ol Figure 13 2 
the Cahlorma Achieirement Tests • ^ 


Level 1 

Grade 1 6 
979 

Grade 26 
982 



Level 2 

Grade 2 6 
982 

Grade 3 6 
982 

Grade 4 6 

986 


Level 3 

Grade 4 6 
978 

Grade S 6 

962 

Grade 6 6 

983 


Level 4 

Grade 6 6 
977 

Grade 7 6 

981 

Grade 8 9 
980 

Grade 9 6 
982 

Level 5 

Grade 9 6 
979 

Grade 10 6 

977 

Grade 11 6 

976 

Grade 12 6 
981 


* California Achievement Tests Test Manual Tiegs and Clark repro- 
ouced by permission of the publisher CTB/McGraw Hiil Monterey CA 
93940 Copyright © 1970 by McGrawHiU Inc 


There are two basic vehicles for the interpretation of achievement InteipretabiUty 

data norms and criteria for the evaluation of proficiency Teacher 

built tests utilize criteria for the evaluation of proficiency while 

standardized tests use norms For illustrative purposes, Figure 13 3 

shows a norms table for one level of one standardized achievement 

battery, the Iowa Tests of Basic Skills (such norms tables are to 

be found in the Test Manuals of all major achievement batteries) 

Both types of interpretations are valuable and both have been 
described in Chapter 11 Specific interpretation of standardized 
achievement test results based on norms will be described m detail 
later m this chapter, and the relative role of each kind of mterpre 
tation will be described in the last chapter 

Since teachers give tests mainly to evaluate student progress, 
their interpretations are usually based on proficiency criteria 
Hence, teachers needs for test data are iikely to be met directly 
by their own tests The kind of information that administrators 
find useful is most readily obtained from standardized tests which 
enable them to compare classes within a school, schools within a 
district, and districts within the nation 

Since each typically uses a different basis for interpretation, 
teacher built tests and published tests cannot be directly compared 
in terms of interpretabihty Moreover each type should be seen 



380 Measuring Achievement with Published Test Batteries 

Figure 13.3 Samp/e Percentile Norms Table lor Level 7 (Grades 1.7-2.S) of Ifie 
Iowa Test of Basic Skills Given at Bnd of Year.* 




Comparing Standardized Achievement and Intelligence Tests 


381 


as serving a different function — one that is consistent with its 
asis for interpretation Testing m the schools is improved not 
only by making better use of standardized tests but by helping 
teachers to be better designers and developers of their own tests 
Standardized tests are not substitutes for teacher built tests, 
rather they arc supplements Nonetheless, because results of 
standardized tests often become pari of a student's permanent 
record, teachers should treat these tests with utmost seriousness 


Although lest publishers provide highly specific instructions for Usability 
the administration of standardized achievement tests, they are 
inevitably more difficult to administer than teacher built tests 
Teacher built tests, except perhaps for mid terms or finals, are con 
fined to a single class period and hence do not disrupt the sched 
ule Standardized achievement tests take three or more hours to 
complete — usually m multiple sittings — and so usually necessitate 
an alteration in the schedule for an entire school or school district 
Moreover, the infrequent use of standardized tests makes their 
administration special to both teachers and students With under 
standing and a high degree of cooperation, standardized test ad 
ministration can be minimally disruptive When teachers and siu 
dents alike fail to understand the value of these tests, the level of 
disturbance increases accordingly 


COMPARING STANDARDIZED ACHIEVEMENT AND 
INTELLIGENCE TESTS 

Is there really a difference between a standardized achievement 
test battery and an intelligence or mental ability test or is the 
difference more definitional than reaP In this regard Anastasi 
(1968) makes the following observation 

It should now be apparent that all ability tests fall along a con 
tinuum with regard to their dependence upon specified prior expe- 
nence In this respect traditional achievement tests broad achieve 
ment tests and intelligence tests differ only in degree (p 485 ) 

Anastasi (1968) thus offers the concept of a continuum defined by 
the degree to which the skills measured by a test are based on a 
specific experience (that has usually been created to meet a set of 
objectives) A version of this continuum is shown in Figure 13 4 



382 Measuring Achievement ivtlh Published Test Batlertes 


Figure 13 4 A Continuum ol Ability Tests 


Objective based 

Achievement 

Tests 

General 

Achievement 

Batteries 

Group 
Verbal-type 
Mental Ability 
Tests 

Individual 

IfJtetligence 

Tests 

Nonlanguage 
and Performance 
Tests 

teacher built 
TESTS 

CALIFORNIA 

achievement 

TESTS 

STANFORD 

achievement 

TEST 

' IOWA TESTS OF 
BASIC SKILLS 

OTIS LENNON 
MENTAL ABILITY 
TEST 

SHORT FORM 

TEST OF 

academic 

APTITUDE 

STANFORD BINET 

WECHSLER 

INTELLIGENCE 

SCALE FOR 

CHILDREN REVISED 
(VERBAL) 

WECHSLER 

INTELLIGENCE 

SCALE FOR 
CHILDREN REVISED 
(PERFORMANCE) 

CULTURE FAIR TEST 


teacher™ at aZeremcm Stto' A characterized by 

he so called culture free intelhaenrl^.^ right end characterized by 

ttre the effect of become t ’ experiences that the 
case the concent 08““^ 

Sool h^ presented data to show 

ions “'V^cn'cnt battery measure ^“>>‘“13 front a 

ment i that the terms ?“ts antially identical func- 

Ssrm"l,™™“‘>f“-c SeMn "achieve 

They found ‘u“ a'tuations thaf are 1^^’ “"“® different 
Meaning ^cormg Imelhr° ‘•“'fcrent ) 

Xtm„:“?/‘f7hmUsdlt;"theS^^^^^ 

Levme (SsTd ™““" “'f cffecuven' “ ''f “f "ative 
trying to nreibi ' Jcscnbes a sttuatm!. Pf instruction 
‘ronics training pro^^ recruits would blTaS't'’ 

"lent test and ®T using both an ? ’’nss the elcc 

"I'elligence test) mU?' '^'assilication T “hieve 
’ rather than jus, the la« ° ' <“ aptitude or 

r alone it was possible 



Administering a Standardized Achievement Test 383 


to predict success more accurately This suggests that the more 
specific achievement test measures something somewhat beyond 
what the more general aptitude or intelligence test measures 

However Educational Testing Service {I960) reports a greater 
interrelationship between the aptitude tests of the Graduate Rec 
ord Examination and the GRE achievement tests than between the 
different aptitude tests themselves For example the Verbal Apti 
tude score and scores on the Social Science or Humanities Achieve 
ment tests correlate above 70 while the scores on Verbal and 
Quantitative aptitudes have correlations under 50 This suggests 
that the verbal mathematic distinction may be more pointed 


than the mental ability-achievement distinction 

Cronbach (1970) has also examined the relation between mtel 
ligence or aptitude and achievement tests and posits a spectrum 
of ability tests similar to Anastasi s His continuum ranges from 
tests that reflect the most adaptation or transfer of learning such 
as the nonverbal IQ tests to tests that measure crystallized 
achievements resulting from direct training— achievement tests 
Cronbach (1970) also shows an analysis of the Urge Thom 
dike Verbal (LTV) and Nonverbal (LT NV) Intelligence Tests in 
terms of their overlap with achievement tests He sees the LT V as 
having 76% overlap with achievement tests and 15 /o distinctive^ 
ness while the LT NV has 59% overlap and f 

From all points of view it must be concluded that 
ence between standardized achievement tests and mtelhgence tests 
IS one of degree of specificity or reliance on ® 

We must be careful in interpreting inte igence e jjj 

considering past learning expenence and we mu c-onsKter 

mteipretml Standardized 

ing native capacity Each test nas sum 
from the other 


administering a standardized ACHIEVEMENT TEST 


For very young children read 

orally Thereafter ““P* ^ *^11 For all standardized tests cer 
by the student himself or herseir j^,r..ctration 

tain general rules can be established for administration 

u h^fnrp the testine date with all mstruc 
(1) FamiUanze yourself reading the Teachers 

nous for admmtstratwn This means reami g 

. The r . and 12« respectively are measurement errors 



3S4 Measuring Achievement with Published Test Batteries 


Directions for Administenng carefully and completely so that 
you know exactly what to do For example, some standardized 
achievement test battenes come with practice tests that are to 
be given a few days before the actual testing Had you not 
read the directions thoroughly you might not have known 
about this and hence might have overlooked it 
(2) Make sure all students have everything they need to take the 
test and no more Students must have number 2 lead pencils 
test booklets, and answer sheets They should not have books 
Ml scratch paper when allowed 

'’"P'' Informatwn Box must 
berned m by tHe student Tto. mformat.on typ.cally .ndudes 

For 

mfomauor'"'"’ requMted to fill m this 

fi==id= to change ,t Gtvmn Zr. 

cause variability mstudent^n^.^f.^ instructions may 

mutable for that testing ™'™“ 

moTt StZZtalZZvs'r ->'■=- 

<=11 you preciserhZ “”■“«»"- 

watch a seZr Zd t r: " z™" « 

stop each test exactly oZLe iZt Z' “<< 

ahead of time, the, may not go on to ^ <«' 

out before they are Zshed “ <™= <<«'« 

(6) Make sure that all students know '’™<'«'=toss 

any questions by reading /gam pnZ Answer 

rwerzr-^-“-=wC 

that "an 'ZdZs UK IZZZgT ™°"’ Check 

(81 ?r's“‘’”®«'’‘’“*P<''toedures ® “‘“•tons Be available tor 


as you would your own 



InterpreOng a Slandardlied Achievement Test 385 


final exam Show that you feel the results will help students 
do not show annoyance with the administration for wasting 
your time The students will sense your feelings and will be 
have accordingly 


IXTERPRETING STANDARDIZED ACHIEVEMENT TEST RESULTS 

Standardized achievement test batteries may be _ 

but they are usually scored by machine to 
Testing companies have available a wide 

or formats of which two are pnmanly used by tocher “e 
individual test record and the class Us. ^ 

applications of these “^iJb^sed in lis chapter have 

the concepts of measurement that 

already been described m Chapter 11 The reader 

this section in conjunction with Ch P , Figure 16 3) 

individual test record appears m Our discussion 

win fo^ron each orth\^‘^hinds of scores reported and their 

interpretation * 

refers to the number of 
Raw Score The ^rSar test or subtest (No RT) 

Items the student got "g^t °n P ,u,u, number of items 

This IS sometimes reported j,„urent subtests have 

on that subtest (No POb) t compared from 

different numbers of item ^ ^ relatively meaningless piece 

subtest to subtest ) The r jf„rmance on a standardized 

of information for describing p 
achievement test 

t i student s percentile rank (%ile or PR) is 
Percentile Rank. A norm group (either national or 

the percentage of be or she It is usually computed as 

local) that scored le"" ‘ ^rore plus one half of those with the 
the percent with a 1°"' ^ raw score converts to a percentile 

same raw score If a stu ^ „,utely 70% of the students in 
rank of 70, this j j^ss well than he or she did Thus, the 

the norm group performeo le 

. A fnr the vanous standardized achievement 
> The kinds of scores reported^ u,e 

test batteries tend to g forms can be found in Chapter 15 

be similar Additional reporti g 




Reproduced by permission of the publisher. CTB/McGraw HtU 




Figure 13 6 Sample Class Analysis for the Stanford Achievement Test 



Brace Jovanovich Inc Reproduced 



Hi Measuring Achievement 


with Published Test Batteries 


percentile rank measures the student's standing on a 
Lbtest relative to a group of other students who have ttiken the 
test, the norm or comparison group (Raw scores can be converte 
to percentile ranks using conversion tables that the test publishers 
provide This is done for you if you use the computer scoring serv- 
ice ) It IS useful to compare the percentile ranks that each student 
scores on a subtest relative to his or her scores on the other sub- 
tests to get a picture of the individual's strengths and weaknesses 
(This companson can be made even more directly among stanine 


scores ) 

It IS important to emphasize that the percentile rank re 
ported by the lest publisher represents percentage of students with 
lower scores and not percentage of items answered correctly A 
percentile rank vs a way of telling how well or poorly a student did 
relative to students m the normmg group * 

Some lest publishers also report a student's percentile rank 
for each subiest withm a percentile band The percentile band rep* 
resents the range withm which a student's true percentile rank is 
likely to be found This range is computed by adding and subtract- 
ing one standard error of measurement (an indication of the van* 
ability in the test itself) to the actual percentile rank the student 
obtained The individual test record shown in Figure 13 5 has a 
section labeled National Percentile where each subtest perform- 
ance IS represented by a senes of X's These X's stretch across the 
percentile band surrounding that student’s percentile rank on each 
subtest Where bands overlap from subtest to subtest, it is possible 
that the students performances on those two subtests are not 
really different even though percentile ranks may differ by as 
many as 30 points The less the bands overlap, the greater the like- 
hhood that the differences between subtest performances are real 
diifercnces as opposed to chance differ^ces 


stanine. The stanine(s) is a simdard score (i e . a score based 
on Ihe devianon of scores of the norming group from the mean 
of to that group) on a scale 

units The scores may be interpreted on the following 

P ge 254 for a stemne-percerUile rank conversion table 



Interpreting a Standardized Achievement Test 389 


9 — highest level 
8 — high level 
7 — well above average 
6 — slightly above average 
5 — average 


4 — slightly below average 
3 — ^well below average 
2 — low level 
I — lowest level 


Grade Equivalents. Scores on standardized achievement test 
battenes are often given as grade equivalents (GE) The grade 
equivalent that corresponds to a particular raw score represents 
the year (i e , grade level) and month of school of students who 
obtained that raw score as their median score (score obtained by 
the student in the middle of the distribution) If one of your fourth 
grade students, for example, got a grade equivalent of 5 4 on a sub 
test, that would mean that among all the students in the norming 
group who were in the fourth month' of the Mth grade of schoo , 
the median score on the given subtest was the same as a o 
tamed by your fourth grade student We could say that m that 
particular area, your student was performing like an ^v^e 
student in the fourth month of the fifth grade ( , 

Chapter 11 for a more complete description of grade equivalent 

""it I important to note that a GE of 5 4 
fourth grader at the same level as the fifth gra , ^ 
on that subtest It only tells you that he or she In 

same level as the fifth be 

your school many J je equivalents are com 

performing above grade level 8 fourth 

puted in tenns of ^ 

graders may be performing ^ , performing above grade 

behind your fifth graders who are P 

tliat norming groups are large, highly 
It must be udunts Irom LioL throughout the 

representative g™ups of slu important since 

country „k against which all test takers are 

they represent the bonchma k uutof 

compared A weakness occur ygn, The scores in your 

date but renorming can soi 

• Recall that grade 

while mental ages (see ChaP jgj gges may be based on a s 

grade equivalent scores hke other nme points m that year be g 


grade equivalent scui^ n 

administered during t"® ^ extrapolation 
determined by interpolation or exir p 



Measuring Achie\ ement w,1lh Published Test Battenes 


school will not become part of the norm The norm group is a dis- 
tinct group chosen representatively for standardization purposes ■ 


Standard Score A standard score is one based on a given 
mean and standard deviation Such standard scores, reported by 
some publishers, are useful for charting an individual student’s 
^owth or development on a given test or snbtest over the course 
suSests ‘'f Puposes of companng scores between 


men^M battt''cf rtf ““i! achieve- 

mental abilitv or ^ administered together with a 

ment grade eqmX Th^ achteve- 

formanX“L LntaStv®o; <■ = . Per- 

in question It is an mdicatio/oF student 

relative to others of the same inteTlio^ ^ ajudent is performing 
basis for mtepretmg grade ennm t used as a 

they might have becn*ex^ted to ^“SSesting whether 

pvrformancc of students of lit intllTgSce 

Item Performance. In aditiii 

acores(snchasthosedescnbedX V °t reporting 

■n tenns of student “n ba ported 

shojvn in Figure 13J) along vvilh ‘ndividual Item {as 

("*c"°™‘"B8roup and/or local sanf?'^™'"®" students 

(as shown in Figure 13 9) Th^ f ^'^“'"2 each item right 
transition between uorm refeencX f «P“«tng represents a 
•THe publishers rafarencing 

Prs “TaTauf fj? 

'•S"e“Sf„m.a 

prcdicti\e purposes 



392 


Measuring Achievement with Published Test Batteries 


student progress, diagnose strengths and weaknesses, and pre- 
scribe instruction 


While teachers can construct their own criterion referenced 
tests, published versions give them a higher degree of quality con 
trol of the test instrument and potentially good reporting features 
Teachers can then interpret the results in the light of their own 
instructional objectives 

It is possible to build a 'standardized" achievement test that 
*’y “’‘“"S °"'y “I': of the three criteria 
aS^eveme„r “I ““S' recall that standardized 

*“ "“■oo properties ( 1 ) items 

(This also sugKsts a referenced 

verting a noSere„«d^*®™« 'bird P™P=^‘y-for con- 
illustrated ui Box 13 1 ) ^ entenon referenced one as 


Some Published 
Cntenon 
referenced 
Achievement 
Tests 


set of 34 " SlvmS"smTed*or"*?'^ “ 

mathematics The Inveoto^t ">P'“ >" 

Bureau/McGrawHiIl m thme levd Th^°” California Test 
those objectives on which nrofine c “"ng report indicates 
than descnbmgperfo™a„i™®-'7 attained Rather 

of the achievement batteries previous ““S’o score, as is true 
escribes performance on each obipn ^ ascribed, this inventory 
filo of the student s mathemauefaej F® o prm 

f®r?'tud=nts a response fomat ttfF”®”' ““veover, this test 

or five response alternatives Unf™ f *om to four 

P^fioiency on an 

tonnes*®® ''‘“oP'o ts Houghton P'f "nPonce of one item 

objective mea rocordmg Dumi and instruc- 

‘-"S-SSSi? 



Criterionrefcrenccd Achievement Tests 393 


dents (and, as we shall see later, the performance of the class as a 
group). Testing systems such as IPMS combine the best features 
of standardized tests (e.g., high quality items, hence high reliabil- 
ity) with the best features of teacher built tests (eg , high appro- 


Box 13.1 


GETTING CRITERION-REFERENCED INFORMATION 

The functional difference between a ctitetlon-referenced (CR) and 
standardized or norm-referenced (NH) test is threefold: 

(1) CR scores are spechcally targeted to 
are more global, hence you get more f 

cause there Is one score per objective) but there are fewer Hems 

(2) NR^coJe; can be interpreted vra 

while CR scores require you to set your own culoll lor adequate 

(3) &R';:rdlm’s are written to Produce " -““^rt^repms^^ 
H?o° domm^o? the^ob'ective as the item writer sees if 

Alter examining these i^l. 

are written using a content outline or sk ^ 

It does not seem unreasonable m or cement area 

CR tost provided that the publisher rep „„ 5,ondardized) 

The large publishers will ‘rAchlev menr^S or Iowa Test, of Basle 
tests such as the Metropolitan Achio 

Skills. 10 bo mofo holpful to 

CR interpretations of tost res j^o points must bo 

teachers than NR interpretations ts contents moasuicd 

kept in mind. First, it is important and that there are a 

are specifically the ones you nee liability purposes Second, smeo 

sufficient number of items per s i .. aniialion behveen high and lo.«r 
the items are written for -cores of your students on iho lost 

scorers, you may have to yajo pioficicncy on each skilt (rather 

to decido what cutoHs to uso for ^ norm-refcrcnc* 

than using a fixed cutoff). In a would bo based on iho loialive 

Ing bocQuso Iho choice of culo» P« numbor of .terns ans.*c»ed 

porformanco of students— lor ox 

correctly by 70“« of the students. 



394 


Measuring Achievement with Published Test Batteries 


pnateness) Teachers can use IPMS to create in effect their own 
tests that is tests that are appropnate to measure their own 
objectives and monitor student progress on them A sample list of 
objectives on the IPMS Reading Test appears in Figure 13 7 
curremirr cntenon referenced achievement test 

vXsHIs M ^ for grades 3-S is the Harcourt Brace lovano- 

th^sv^tem^s H ‘ho others 

this system is designed to provide a basis for diagnosing the needs 

SUl MinTs u organized into 

sSe ivH ?s:rEa1h^^t^tl^/ “d^* 

Systems like SMS more readilv'hete't ™™o<ftate feedback 
"tent on their own objectiVK^hai*^ 

teries because of the uresencrof I i * "“rm referenced bat 

numbers of items per objectire multiple 

results that focus on prohciencvnerob for reporting 

testing system can be adapted “ modularized 

adapted by teachers to fit their own needs 


Reporting 

Cntenon 

referenced 

Results 


The printout (Figure aP . i 

Individual Diagnostic Matnx Xh^Dh? called th 

viated names in the left hand boLt^ Th 

box represent the numbers of the i,/^" alongside ead 

•■v= In an actual printout a pt '!™ *“' ““sure each objec 

^inber If the student anstred'^^ “r""f *= .ten 

answered incorrectly “rrectly and a minus if he or shi 

f-b'/r" referenced tests cat 

unS^Tan “omx linked to (thS examining per 

a;;:arsrp"'“™“^ •- 

“°‘°Xdoes tbTf A‘=b-evement Test! 
dso reports class^a °bjective (by pup,, “p™ “''Port individual 
•'onofstudem? ‘*®P''°“‘systeiifoeTil^"““ number) il 
record enables i?'*'”® ''’P “0™ "ghton th ‘be propor 

formance ™" “> ‘''eot f”.ndi”d“r ‘ This 



Criterion referenced Achievement Tests 395 


Some Sample Behavioral Oblectives from the Individual Pupil Mom- Rgure 13.7 
toring System tor Reading, Word Attack Booklet, Level 5* 

PHONICS 


CONSONANT IDENTIFICATION 


501. Consonants, Beginning. Middle, & Ending 
blend (e g str, nip) or digraph (e g di. ng 


- Blends & Digraphs: Identify a consonant 
tdt) that will complete an mcomplete word in a 


sentence 


CONSONANT SUBSTITUTION 

502. Consonants. Beginning. Middle, & 7 for'^speafied IcSol a pnntfd 

consonant, blend (e g str, nip), or digraph (e g c/i ng, icn, y 

word, forming a different real word 

VOWELS 

« h, irf^nhfv the correct phonetic pronunciation of a vowel 

503. Vowel Combinations - Oigraphs Id r^, vowel (e g 

combination m a word m which me iw 

ea, ai, oa) 

« Identify the vowel combmation(flU.ou Of, oy mv. 

'“■oro1o“r,Uomp;?.= an 

505. n.. end I- Conttolled Vowels Match a wotd that has an n. w. ot / controUed vowel to a 
word that has the same towel sound 

„>.rv the vowel sound as short or long m a pnnted word 
506 Vowels, Short vs. Long; Identify the owe 

(vowel + final e, closed syllable, open syllable) 


VAKIAiyit> 

. f-hoose the correct stable consonant symbol that represents 

507. Variant Consonant So“"ds. CM ^ ^ (e as L ors g asg or) or zh pli as / s 

the sound made by a speci i 

assorzors/..c)iasc).or* edasdorr) 

, . Identify the unsounded consonant in a word with the pattern An sr 

508. Silent Consonants. Idenuiy 

"Z vowels (Word Blending): Identify the group of letters that w.U complete an 
509 Consonants + Vowels |WO ^ scream) 

mcomplete word (eg scr + eam 


...rtsccion of Houghton MifOm Company. 
* Reproduced by permission or 6- 






Criterion referenced Achievement Tests 397 



m 

■a 

mw 



gMiiw.ij^Ltunminw:W 



IS 

m 

UBiB 


nB 



m 

m 

m 

m 

i 

I 

■ 




m 

tm 

m 

■ 




rmBSSKSBammtm 

EEI 

■ 

m 

■ 

■ 



Mjl 


ESI 

■ 

m 

■ 

■ 

BIBBi 



■ 

ESTIMATION 

Em 



NON STANDARO UNITS 

21S 



INCH roOT VARO 

223 

224 1 225 1 

w 

TEMPCRATune 

226 


S 2 

AREA 

229 


s 

VOLUME 

231 


5 * 

METRIC SYSTEM 

219 


< * 

CRARHS 

23S 


, a 

MONET 

236 

237 ja6T2»i«<>l 

t- 

4'CVIO 

241 


i: 

WEISHT 

244 



CLOCK 

247 







OOINTI 

2M 



lINC LINE SCSMENr 

lai 


■ 

REBI9MS 

262 

, 


CIRCLES 

263 



Symmetry 

284 



Rrisms ano ryramios 

266 



Ray ano ancle 



s 

Intersection 

261 


o 

parallels RERRENOICULARS 




ROLYfiON CLASSlKICATlOM 







m 


siT] I 


nn 

— — 

252_ 

— 

5 51 

1 000 99 ROR 



s P 

'°lT..rT 

257 


5 

i *i 

2i! 



gj 







» ^ cnn of the publisher, CTB/McGraw 

Reproduced by Ky McGraw-Hill, Inc 

CA 93940 Copyright © 


HiU, Monterey, 





Kiss Snytha 
Hootville Elen 


PB^ 

'os Gustovson P K 


02 Bastian K B 07 Hetden C M 

03 Bowles C C 03 Quarles T S 

ni 1 



Primary 3 A 

Grade 3.8 


0*u otTBtpn< 

04/15/74 


ITEM GROUPING InjUuctiDnal ObiHivvt 


Uutiies. Tlic pupil 


identifies nucher which is at a particular 
point on a number line. 

represents whole rvueber in terr^ o£ a specified 
fcactlGP. 

identifies set consisting only of even numbers 

indicates position of a number relative to 
three given nuriers. 

detemines point at which given number is 
located on number line. 

adds in finite system of a cLocIc mdule. 

ITEM CROUP MEWi P-VAtUES 


'<otation The pupil 


elects nuaetal for which digit i 
place has greatest value. 


ptctatlon * Criterion-referenced Scoring and Inter- 


STANFORD AchievemeniTesJ or TEST Of ACADEMIC SKILLS 

ITEM ANALYSIS 


Tnl 


Math. Concepts 


32 






400 Measuring Achievement with Published Test Batteries 


\IE,\SURING ACHIEVEAIENT IN A SELECTED AREA: READING 

All published achievement testing is not restncted to the kinds of 

throughout most of 

1 cm fo "hm“tm tn:r“ 

ness edneauon End'^h foe ar^ achievement tests in bnsi 
agnculture dnve/ed..!- , “‘ j ^ languages, mathematics, 

philosophy, psvcholopv economics, industrial arts, 

studies and specific TOcatmns“(e\‘‘CTr"'^“‘’'"®'^‘^'““'^°‘'“' 

of the specific area arhim ' ^ nursing) Of all 

used are"^ tests Tr^^TT^^T' most widely 

tasts --teadmg.extbooksamroU“n’;St'lh^m^^^^ 

Achlcvcm^l S and” pre°scnpt”7 t°est's achievement test bat 

there are «andardLd VehietemernTT^t ^-thtavement. 

cxdusitely for the o r!lJ ‘ 

often enable teacher and readme achievement and that 

IS? 

m mo levels gradel^-n’afo g^d'L ImI ^t-atlable 

lmon"’'l!"u" “ the need for ' ““ hour 

among high school and junior and ^ '”“"‘‘""8 reading skills 
tt Jields mo scores-lcvTo? .“"t* community college students 
prehension This test f,7“P;=''=''rt<>t> and spfed of e"^ 
sh?s h h standardized has\iBh"r”'i “ hand 

m°rs nrS ""'^‘t'fttnns with kSishtnr'V' reliability, and 

There is a series of 
F ^ades «. Su4y fc. Test (A, gn.de 

The Pnmaiy conin' ’ ‘'aad.ncss Sk’ifo^''^ grades 

andiakvsAflm^ ‘ *'"“aublesls vocabnl (grades K-l) 

antes lo administer (Form comprehension 

trormes for grades 2-3 meas 



Achievement in a Selected Area 40i 



. takes only seven minutes to admm 

ures speed and accuracy sublests, vocabulary, com 

.ster ) The Survey ^ mke 45 imnutes to 

prehension, and speed J subtests which 

administer The Readiness SIunsT«.^_^__^ P^'mary and Survey 
are administered in four hat editions 

Tests are available in mach ^ ^ 

the Readiness Shills Test i 

only The primary ''“‘“‘if j^j„encies Unlike many of the other 
cation of specific , average for very young children 

reading tests, these kindergarteners and first graders 

including f _,i,ers College Press and distributed by 

They are published by leacuei 
The Psychological Corporation 

j r(‘<its (1973) These are available at 
Iowa Silent ® 2 grades 9-14, 3, grades II and up for 

three levels 1. orovides scores for vocabulary, reading 

accelerated students Ea P 



402 Mcasimns AcUevement «4Ui Published Test Batteries 


orhn?b'r''°i’' ^“t"* (•h's 'after uniquely reflects 
on both speed and accuracy) Level I knd 1 tests allo provide a 

tair t^S a S 1?? T Working time for Levels 1 and 2 is an 

one kvel°("„°re''eS*dy“‘ at 

IS enjoying somewhat widesprefd'tf lubfiu®"'^ “ 

and reading rate Th,; seems to be a '^od mst 


80X13 2 


8, RSCTIONS IN SCHOOL ACHIEVEMENT TESTING 

these The f.tst t 

el» In operation) and Is chaiaSrtzed bv^r°"a^' ^'“'a'nan' Program i 
achievement battery on a slalewtde basis to " ^ commo 

tnr s, achoel distriL J ' f "''ongths and deli 

^s poblio sohools Of a state am ^,or, „ ™"0“s grade levels „ 

ine second new dirpn 

°X° ma'th' ^“'"'^.eTe^emt;" "“--men 

Achievemeni Tp,?^" Stanford AchipJ ^ Pleasured by thf 

'hcirowrd, me -too' P'srots arl „ 

'^ent of these anT' * '^easurino th ^'’Saged in setting 

--.stri'Catr 



Achievement in a Selected Area 403 


tion for screening purposes (There is now available a revised 
edition of the Nelson Reading Test suitable for grades 3-9 which 
can also be completed in a half an hour ) It is published by Hough 
ton Mifflin 


Stanford Diagnostic Reading Test (1973) This test comes in 
two levels Level 1 for grades 2 5-4 5, and Level 2 for grades 4 5- 
8 5 Both Levels have subtests in reading comprehension, vocabu 
lary, syllabication, sound discrimination, and blending Level 1 
also has subtests in auditory discrimination and beginning and 
ending sounds while Level 2 has a rate of reading subtest Most 
noteworthy about this test is that it utilizes the diagnostic 
prescriptive approach Diagnostic testing is concerned with in 
depth coverage in a single curricular area — in this case, reading 
The subtests are specific enough (seven for Level 1, six for Leve 2) 
to enable teachers or reading specialists to identify individual stu 
dent weaknesses in reading skills Rather than trying to discover 
what a group of students know, this test helps uncover the specific 
skUls on which each pup.l needs help so tha. remediation can be 
prescribed Testing time (not including ‘ ^ 

Level 1 IS two hours and 17 minutes, for Level 2 
half, but neither need be taken in a single sitting It is published 
by Harcourt Brace Jovanovich 



Measuring Achiev ement with Published Test Batlenes 


Additional Information Sources 

® ^ ^ the secondary 

fn^Tsoc'^r International Read 

Highland Park NJ 

0/ ediiLfoiitl rescriJMt'rS N M ^ Encyclopedia 

EducanonalTesnnnSe™^ >’65 pp 7-17 

lattes tor selected readm 'u' Equivalence and norms 

DC us r- ^ ^ « Wash 

Farr R 4 Anastasi^N' T«™/" 1974 

-1 revieiv and evaluation Newark" Del “‘''"aPainent 

Assoc 1969 * International Reading 

""^ed^“re:rNTtd~^^ 2nd 

AdMsoo-ServiceSenes N?3M96r (Evaluation and 

^ ^ ^ Delon P f' hA tu 

States Wash DC National available in the United 

Seiblf* 

Whilla(Ed) handbook of mmurt] “‘=E‘9vement In D K 
‘oral sciences Reading Mass Add assessment in behav 

SMass Addison Wesley 1968 Chap 8 



Self test of Proficiency 405 


Self test of Proficiency 


(1) A statement or brief story is read aloud by the teacher A question 
based on the story is then read aloud along with four answer choices 
The student must identify the correct answer choice This procedure 
IS used on an achievement test to measure 
a reading comprehension 
b word skills 
c language mechanics 
d listening comprehension 
e vocabulary 


(2) Many standardized tests measure three aspects of achievement in 
mathematics computation concepts and applications Below are 
three sample items Which of the three aspects does each measure? 
a You place $50 in a savings account that earns simple interest at 
the rate of 4V4% per year At the end of one year your bank 
balance would be 

$52 50 $52 00 $52 75 $52 25 

b /T4^ 40 38 35 30 

C 10‘is 1 000 10 000 100 000 1 000 000 


(3) Consider the comparison between a teacher built achievement test 
such as a mid term and a standardized achievement test Which one 
of the five test criteria (le appropriateness validity reliability 
interpretability usability) do you think most favors the teach bui 
test (at least ideally) and why? 

(4) Continuing with the consideration in item 3 which of the five test 
criteria most favors the standardized achievement test and why’ 

(5) There is a clear and distinct separation between what an intelligence 
test measures and what an achievement test measures 

TRUE false 


(6) Below aro five categories ol tests Number them from 1 to 5 to indi 
cate their place in a continuum with regard to dependence upon 
specified prior knowledge with number 1 being the most dependent 
a group verbal type mental ability tests 

b individual intelligence tests 

c general achievement batteries 
d nonlanguage and performance tests 
e objective based achievement tests 

(7) Cite two reasons why it is important to fam,lmr.ze yourself with all 
instructions before administering a standardized test 



Measuring Achle\ ement with Published Batteries 


(8) In addition to (amiliarizing yourself with all instructions before 

totw'in to'.? a '““f o'ter iniporlant rules to 

lOMow in test administration 

^'raw sco,a“""’"°" “ <=" "=« 

® / nno nf nina 


a raw score 
6 percentile rank 
c stanine 
d grade equivalent 
e standard achievement 
score 

/ anticipated achievement 
score 

9 Item performance 


'^1 ouuie un me leu 

/ one of nine scale scores reflect- 
ing performance relative to a 
norming group 

II the number of items the student 
got right 

m report of scores on each individ- 
ual test Item 

IV a score in standard deviation 
units that reflects the distance 
from the norming group mean 

V the percentage of students scor- 
ing lower 

VI the number of students who 
took the test 

VII a score based on the perform- 
ance of students of the same 
age and IQ 

viil a score expressed as grade level 
of median norm group student 
(lot At th^ '^no achieved It 

ai:tiievementtos,to°e 3 ^“"'’ ‘““K a standardized 

a '""='=^"1 >ype ut score, that is, 

'-■-ube,toe,ypectscL“::reu?rr:;3r„:: -- 

// 65 1 5 S D 

III 61 right ^ ^ 

formance?^^''’ uso to describe Deedies per- 

I h'ghest level 

" Wll above average '' 

,11 lov,,,,,..... V slightly below average 


Idwesl level 

* 'Sir° r ciT/d^'” ■" -p- 



Self test of Proficiency 407 


(12) Percentage figures are used in both crilerion-referenced tests and 
in norm-referenced tsts Describe the defference between perform- 
ances reported as 85% on a criterion referenced test and 85th per- 
centile on a norm-referenced test 

(13) List three characteristics on which reading tests can differ 

(14) Using this book as a test reference source, identify 

a a reading lest that can be used at the junior college level 
b a reading test that can be used in kindergarten 
c a reading test that can be used for sixth graders 



chapter fourteer^/ Measuring 

interests, Attitudes, 

and Personality Orientation 


OBJECTIVES 1 Distinguish among the following tests of inter 
csts and career related orientations in terms of 
what each measures and how it measures il 
a. Strong Vocational Interest Blank b Kuder 
General Interest Survey c Ohio Vocational 
Interest Survey d Self directed Search and e 
Career Matuniy Inventory 

2 Distinguish between attitudes toward self and 
school and their measurements by the a Ten 
ncsscc Self-concept Scale 6 Self appraisal In 
ventory and c School Sentiment Index 

3 Identify measurement for the concepts of per 
sonality adjustment needs and values by means 
of tests namely o. California Psychological In 
ventory 6 California Test of Personality c Ed 
wards Personal Preference Schedule d Scale of 
Values and c. Embedded Figures Test 

•I State poumtal uses for alTcctiv c measures 



MEASURING INTERESTS AND CAREER ORIENTATION 


It IS not uncommon for guidance counselors to administer 
measures of interests and career orientation to students Such 
information is useful for students in making educational and 
career decisions and for counselors, teachers, and parents in as 
sisting the decision making process It is important in interpreting 
these tests that we know what they measure and how they measure 
It This section will describe the more commonly used tests 

The tests described in this section should also be viewed from 
the perspective of career education and its many e or s acros 
country to assist students m choosing and preparing or care^ 
Certainly, interest testing can be considered an ^ 

nent of the process of gaming self and ultimately caree 


In the area of interest testing the Strong 
mark as the Stanford Bmet in intelligence 

Considered one of the most thoroughly s “ . , ^ its 

test instruments m campbeV 

current revision (1974) is . c „„h college stu 

Inventory Although originally designe business studies 

dents and adults employed “} ,c,, ,be test can be used 

(Carter, 1940, Strong ?ha. Strong 

With persons as young as 14 or 15 y e Originally 

scores are rather well fixed between the ages 

separate forms, similarly j,j|l edilion these forms have 

women but in the new Strong P 
been merged into a single one 

content The new SV.B To™ con^ns 325 .terns grouped by 
the type of content The seven content areas a 

occupations (131 items) 

school subjects (36 items) 

activities (51 Items) -iq items) 

amusements (eg , games, spo items) 

types of people (. e sterns) 

preference between two activiti I 
your characteristics (14 items) 

X. cvtR the Manual for tlie oVlii 
at. puetisw by Stantom 

University Press 


The Strong 
Vocational 
Interest Blank 
{SV1B)‘ 


4e» 



Figure 14 1 


Measuring Interests Attitudes i 


d Personality Orientation 


wouId^'ikTdXrtLTd 10 Uhelhcr he or she 

The test is untuned but about an hnu ® i,' 'ndilTercnt to it 
administration An example of “"“"‘td for its 

P-.eniedafterthoseonTh?s 4 ™‘pSet"^'- 

Jaz 2 musician ^'^^Ufjerent Disltkc 

History 

Giving a lecture 
PJaying backgammon 

People who travel 

Gotog to a basketball game vs p , 

fngtoamuseum Neutral 

mcanputotherpeopleatease 


Prefer 

No 


‘ — —tac xes J 

Scoring The e«i 

pattern to the resnoi«'"* '““Pares the simT'"® P'P'odurc 
tions to see which^oiiel'’‘’“°"' “f P““Ple in a * * response 

deal gives eacl, ,te „ ^ ‘a closest to “ocupa 

'^’nch the answers of ml a weighUalf,! "“i’™'’ 

“farm, net, on 0/ We, on, “ F-E“re 14 I 

Gnnno 




GROUP 

neers 

Men (Gen I) 
° fference 

Weght 


% Like 


•Eft ■ 0 

’=‘<»aui K. sw _____ I 

"g Reprinted by ner«. 

Figure 14 j Press 

“at he would hke 



Interests and Career Onentatlon 411 


to be an actor one point would be deducted from his engineenng 
score (based on a weight of -I) He has shown on this item that 
ne IS more JiXe other men than hke male engineers If he said he 
was indifferent to the acting profession on item 1 he wouJd neither 
gain nor lose a point on his engineenng score (weight-0) If he 
said he disliked actor or actress (which agrees with 60% of the 
engineers) his engineering score would gam a point (weight-1) 
Note that engineers dislike for the acting profession exceeds that 
of men in general by 13% * 

A student s total score for a given occupation such as engineer 
IS computed by adding the weights based on the correspondence 
between the student s responses and the responses of same sex 
engineers on each of the 325 items The results tell the student how 
closely his or her likes and dislikes resemble those of a sample of 
professional engineers of the same sex as the student The pro 
cedure would be repeated that is each of the 325 items would be 
scored again m order to get the total score for another occupation 
or for another reference group (eg women) Each occupation 
score is based on a different occupational sample and hence each 
scoring key has its own weights (Since the SVIB can be scored 
for 124 occupations the necessity for machine scoring is obvious ) 

The SVIB makes no assumptions about men or women m the 
different occupations /t relies entirely on responses to the test items 
to define the likes and dislikes of satisfied people in each occupation 
The weights are not based on expectations and preconceptions but 
on actual responses of people and a student s inclination toward 
a particular job based on how similarly he or she responds to 
people of the same sex in that job Naturally much depends on 
how representative the criterion sample group is and some ques 
tion has been raised about whether the chosen sample of people 
IS truly like the majority oi KOccesstaV 

fields (Super and Crites 1962) However scoring keys for occupa 
tions not included m previous forms have been added and old 
ones updated on the 1974 revision 

Correlations between scores on the SVIB and intelligence test 
scores vary from one occupational SVIB score to another Correia 
tions between intelligence and scores on the SVIB psychologist 


2 We could obtain a score on our hypothetical young man by comparing 
his resuonses to those of a general reference group For a young woman 
both a general reference group and women as specific reference group 
would be used as a normative source While both sexes use the same test 

f „„ j nrofifc reuorC form separate norms by sex as well as combined 

u«d t"m«rpret scores on the SVIB SCII Separate norms are 
used to reflect male-female differences m reported interests 



Measuring interests. Attitudes, and Wrsonallty Orientation 


iey, for example, are ax htgh positive as 43 while '"'f 
co^r^elattons with scores oo the SVIB 
person" key are as high negative as -40 (Strong, 1943) T 
manual reports split half reliability coefficients averaging 88 for 
college seniors while B P Campbell ( 1966a, b) reP°«f “ 
retest correlation of 67 for college seniors Considerable Hlictua- 
tions in interest profiles are found for students below age 17 or 18 
on the SVIB (and on the Kuder General Interest Survey, described 
next, Crites, 1969) 


Application A sample profile form for reporting results on 
the 124 occupational scales appears m Figure 14 2 The 1974 Strong 
Campbell also reports results on John Holland s six personality 
types (called "general occupational themes" here) as described 
on page 420 m conjunction with the self directed search In addi 
tion, scores on 23 “basic interest scales” are reported (also in 
conjunction with Holland's typology), enabling students to link 
their interests with more general occupational areas (eg /'medical 
science ') 

In using the SVIB-SCII, the counselor would be wise to heed 
the advice of Darley and Haganah (1955) to base interpretations 
on patterns (as would be reported in Figure 14^ and the other 
profile form on occupational diemes and areas) rather than on 
particular occupational keys 


Kuder First published in 1939 as the Kuder Preference Record Vocational 
General Interest and revised m 1964 as the Kuder Form E General Interest Survey, 
Survey Form E the Kuder is used with high school students, college students, and 
adults of both sexes* It is published by Science Research Asso- 
ciates Steffire (1947) determined the vocabulary level of the old 
Kuder to be two years lower than the Strong, suggesting that 
it can be used with the more able eighth graders and certainly 
with ninth and tenth graders Malcolm (1950), in corapanng 
earlier forms of the tests, found that the Kuder was preferred to 
the Strong for both sexes at the high school level because it 
seemed to lead to a greater understanding of interests 

One shortcoming of the Kuder is its susceptibility to faking, 
that is. consciously and knowingly distorting responses This is 


’ Dp Occupauonal Interest Survey was first oubUshed in 



mple Profile Ferro tor Reporting Occupational Scale Reeults on 
----- .. p.oe,e .. .. 

SrlLSo3i;Svf-.y press 1974, 


Rgure 14 2 



414 Measuring Interests 


Attitudes and Personalliy Orlentailo® 


rarfCularU true tvhen the scale .s used m conjunct.on 
^ /iTorrtn However when used with high school 

Student's* or their own counseling benefit faking is much less likely 
to S Malhnson and Ctumnne (1952) report data from high 
school students tested m the ninth grade and retested in the ‘"'slf 
grade that suggests that Kuder interests are stable enough to afford 
a basts for prediction in counseling yet changeable enough to allow 
for suitable modification through counseling * 


Content The K.uderE contains 168 items that assess interest 
m a total of 10 different interest areas Each item consists of three 
choices of activities of which the student must indicate the one he 
or she Ukes best (which he or she marks as his or her first 
choice) and the one he or she Ukes least (which he or she marks as 
his or her third choice) The activities m each item are written to 
tap three different types of interest Consider the illustrative item 
below 

Build bird houses 
Write articles about birds 
Draw sketches of birds 

The first choice taps mechanical interest the second hferary inter 
esl and the third antsitc interest The other interest areas are 
outdoor computottonaf scientt^c persuasive musicol social serv 
ice and clerical 

There is no time limit on the test it takes students from 
45 minutes to one hour Answers are indicated by punching holes 
in the designated spot with a pin Machine scoring forms are also 
a\ailablc 


Scoring Students can score their own tests convert the 
scores to percentiles and plot the results graphically With the 
kudet one e\am%nes xelatwe strength of each of 10 different 
interests within an individual while in the Strong comparisons are 
all With groups of individuals classified by occupations Unlike 
the Strong Kuder scores are not generated by matching responses 
directly with occupations nor are profiles evaluated m contrast to 
those from occupational groups Item choices m the Kuder are 
grouped to form leri interest type scales based on a combination of 
face validity considerations (which of the 10 types each choice 



Interests and Career Orientation 415 



•■Now, Ihs trick 13 to pick a profaaslon that wortt be obsolete when 
you're ready to enter it " 


c 3 j -..m nnnivsis of the tespoiises of sample students 
seems to fit) and identify item choice corn- 
ea correlational one another and dis- 

binations that are „eni choice can be predesig- 

tinct from other oluatersb • choose within the 

nated to fit a of choices per interest-type are 

forced-choice scores that can be converted to 

then counted 'o samples of high school students. While 

percentiles based on '^‘^8 J 35 criterion-keying {comparing 
the Strong scoring was re ,, j . scoring is best described as 
a choice to a criterion group) -- 8 cluster 

item cluster tfl//>'ing'-wherejn 
is counted and compared. 

. . A sample Kuder profile appears in Figure 14 3 

Application. A ,he pattern of a student s inter- 

It is used as a basis for JP ^ directions that are consistent with 
ests and recommending interpretation process, recent 

that interest pattern. cvided occupational profiles, supple- 

editionsofthenianualhav p ^ 3„„,ber of different fields, ob- 

mented by curricular Kuder to sample groups employed in 

tained by administering 





416 Measuring Interests, Attitudes and Personality Orientation 


different occupations and deteimimng the collective or group pro- 
file of each These profiles illustrate how different occupational 
and curricular groups respond to the Kuder items and the patterns 
on the 10 interest clusters that they generate This represents 



Flgura 14 J A SamplB Kvder Profile • 






Interests and Career Orientation 417 


mo\emcnt in the direction of occupational keying but on scores 
for entire scales rather than on individual items as is true for the 

terms of internal reliabilities the Kuderhas been found to 
be as reliable as the Strong Test retest reliabilities on the 
KuderE range from the lots 70s to Ion 90s Correlations between 
specific occupationally keyed scores on the Strong an in 
cluster scores on the Kuder (earlier versions of toth tests) rang 
from 25 to 73 (Triggs 1944) For example 
computational interest scores correlate ^5 w i 
scientific interest scores correlate 73 


The OVIS was designed specifically for me 
selors and their students in “ ,q,o ,, published by 

peered in 1966 and was standardized '"Jf 'X,l sef it is 
Hamourt Brace Jovanoiich ” ^milar to the Kuder and 

similar to the Strong and in some y conceptually on a 

m some ways different from (DOT) for 

system used in the the «tent to which an occupa 

classifying occupations in term h (D Costa and 

tion involves data people or t g . ] 

Winefordner, 1969) call this the cubistie model 

1 . „i,==.fied 21.741 jobs according to 
Content The DOT has grouped them into 114 

their Data People-Things invol ^ups From these groups 

homogeneous areas ^ ipped 24 interest scales each of 

the authors of the f^VIS desienation The first digit reflects 

which IS assigned a ‘'"'“i group measured by the scale with 

involvement of the occup ^ things Degree of in 

data the second with P=°P‘= q ,o a high of 2 ' 

volvement can range ] mterest categories and the model 

Some of these ““Pf " ^4 
they fit into are illustrated in Figure 

1 The 24 OVIS scales and f ““f” "IlS 

work (001) !!;?*' "I'ocafS (100) nJSng'and related 

or animals (Oil) ^ customer sennees ( timing (120) 

and precise operations (10iU_„_^^ P^tsond Spiled 

technical services ( , (200) appraisal (201) manasement and 

^2®%n2TTr?mat.?n and S ^^ 20 ) 

technolo^ ( P (212) ®^l^ 0 ^(eaching counseling and social 

supervision (2101 grformmg arts (220) teacm g 

entertainment and P 222 ) 

work (220) and medical ( 


Ohio Vocational 
Interest Survey 
(OVIS) 





Th ngs 



t4 4 The Cubistic Mode) o/ Vocahona) Inleresls and Ihe location of 
Some Semple OVIS Scales ' 

• Kizpid irom the OVIS Manuai tot Itilctptettng copyright © 1969 Har 
coun Brace lovanovicb Inc. Adapted by pcmission ol the pubhsber 

Each of the 24 scales on the OVIS contains 11 items Like the 
IvVidet the items teptesent activities for the student to respond to 
but unlike the Kuder the items appear singly rather than m 
groups of three Students use the five point scale like very much — 
like — neutral — dislike — dislike very muck to react to each item 
(This response scale is similar to that used on the Strong with the 
addition of the two extreme categories ) Some items like those on 
the OVIS are offered below for illustrative purposes 

• Repair tables and chairs 

• Write programs for a computer 

• Read books and magazines to sick people 

• Make drawings for childrens readers 

• Raise vegetables 



Interests and Career Orientation 419 


Scoring In most respects scoring the OVIS is similar to scor 
mg the Kuder The items on the OVIS are grouped into scales as 
they are on the Kuder The determination of which items belong 
to which scale was based originally on DOT Descriptions (that is 
of traits that seemed to fit each scale) Item analysis was then 
done to assure that items on the same scale were intemaHy consist 
ent with one another and different from items on other scales 
Based on the student s responses he or she gets one score for each 
of the 24 scales (on the Kuder it is one score for each of 10 scales) 
reflecting liking and disliking of activities that fit on that scale 
These scores are presented as raw scores and as percentiles the 
latter based on a large standardization sample The five resp^ns 
alternatives are scored 1 to 5 with 5 being the "lost ^s.tive 
Agreement with all II items on a scale yields a raw score o 

agreement With none yields a raw score of 11 

While the OVIS is scored like the Kuder ^ 

Kuder in that Its scores V^der Moreover by 

tions rather than to orientations like t . oVIS scores 

relating to occupational clusters ■»'''" ' °tme 

tell a student something about his or er o * . . ^ p 

group of occupations that share a common relationship 

people and things 

^ I. nvts reflect an individual s rela 

Application Results on 

tive interests m to the mterest m that occupa 

interest m each occupation relati information can be 

tion by members of the ^^/awareness By following 

used to lead into a ® , .„j i,s„ng of occupational infor 

scores into the DOT with its his or her tested 

mation and orientations thestu c„ occupations The OVIS 

interests and relate them to the as the eighth with 

can be used with students at 8*^® ® nvi<? clearly attempts to span 
adequate reliability Moreover t ® restricting itself to 

the entire occupational spectrum , . allege education 

those occupations usually associated with a cot g 

j 1 d bv John L Holland of 
This self assessment kit was d^elope Organization of 

the Johns Hopkins Universi^ ..iiinc Psychologists Press It is 
Schools It IS published by thut they are best 

intended to help students dis description of their own per 

suited for by providing them '''* of classified careers 

sonality orientation and with 


Self directed 
Search (SDS) 



420 Measuring Interesls Allltucles and Pcrsonallly Orlentallon 


Content The student first fills out the assessment booklet, 
which contains sections dealing with Occupational Daydreams 
Activities Competencies Occupations and Self estimates In these 

and self estirtes occupations 


ahtv*tvne”®o Hollands six person 

rLfenL=S‘'ge".‘^h""r™"^ 

his or her persona ity te„denc?es anf.h™^ '"*cating 

example a Mudent Ugh, “T scored 

conventional ^ social-enterprising 


ready to mlUh"themsdves"^J jobs 

nation Fin/tA,._ i..* _ ^ comes with an Occu 


pation Finder— a hstme of 4^1 ^ comes ^ 

9540 of the labor wVore^r h 7,hL''‘'=^ representing 
been classified in terms of the nersonal..,, occupations has 
the same classification s 


been classified in terms of the oerion^v . occupations has 
he same classification system 77i U ''I™' '■^’“‘rc “sing 

let for the students to Xsc a«=«nient book 

patrons Finder gives other pertinent, ofidition the Occu 

classifies Pertinent information about the jobs it 


tests because 


not typically frightened by «7,hU V “® ' 

•bey give 1 , ,0 themselves sc7 u U''^ ^1' '«ts becaus, 

Pret It themselves ” •bemselves and usually inter 


Career Maturity 
Inventory (CMI) 


Utnt'5 SU??'*-" rrcITr 

H'll PresUabiy I 7 '^bfoni.UTL B7“', 

mcreases^FA^ a Persons readiness tn m i! 

are Provided toUwrtT Partieurrtyi? Ide“'' 

vant program^! l.u t^adiness i ®^erienf 

-ur..yhecom7m~-ron the mSutSUf“a" 
Content The CMt a 

‘--false 



Interests and Career Orientation 421 


tion until I’m out of school '),* (2) the student's orientation to 
ward work (meaningful or drudgery, eg, "Work is dull and 
unpleasant”), (3) the student’s independence m decision making 
(e g , "I plan to follow the line of work my parents suggest) (4) 
the student’s preference for career choice factors (eg,' You 
should choose a job that allows you to do what you believe in”), 
and (5) the student’s conceptions of the career choice process 
(e g , "You get into an occupation mostly by chance' ) 

The CMI Competence Test measures cognitive variables m 
volved in choosing an occupation These are (1) how well the 
student can appraise his or her job related strengths and weak 
nesses (Self Appraisal — Part 1 Knowing Yourself), (2) how much 
he or she knows about the world of work (Occupational Informa 
tion — Part 2 Knowing about Jobs), (3) how adept he or she is in 
matching personal charactenstics with occupational requirements 
(.Goal Selection— Part 3 Choosing a Job), (4) how foresightful he 
or she is in planning for a career (Plannmg-Part 4 h‘’°ftmg 
Ahead), and (5) how effectively he or she can cope with the prob 
lems thkt anse in the course of career development (Problem Solv 
mg-Part 5 What Should They Dot) A sample from the 100 com 
petence items appears below* (from Part I ) 

Ollie has enjoyed drawing pictures al home He hangs them m his 

room and slJoL .hem lo h.^ 

Trrcl tcZS'n San eahibi. H.s art teacher told 
him they were not as good as .hose of .he other students 

What do you think’ , 

a His art teacher IS the best judg 

h He should get somebody else's opinion 

b He shoul g teacher 

d Ses to draw so he’s probably good at it too 
e Don't know 

The Attitude Scale is estimated to take 30 minutes and the Compe 
tence Test 20 minutes per par 

a nre actual items from the CM! Reproduced 
9 These and other inustrajwj®j,g7 CTB/McGraw Hill Monterey CA 93940 
1 of the , 


Spyrl^’/© 1973 by sixth graders 

T Items were designed to oe ic 


Answer 



422 Measuring Interests Attitudes and Personality Orientation 


Scoring Scores on the CMI are presented on a career matur- 
ity profile which provides a raw score, b percentile score and c 
nght response record for the Attitude Scale and for each of the 
hve parts of the Competence Test (Other types of summary 
reports are also available) Percentile scores arc based on a hm- 

(narnSv'“ norm^mg group or standardization sample 

(particularly in the case of the Competence Test) but additional 

norms are promised for tbe future o out auuitional 

■" “riy and experimental 

onltsva"hd.tyTnd^™abduy^s^u^'^endm^‘ 

voeatiLddevelopten^and'e^r' ^«eareh on 

Pattern Study' Xr XnraT^'.Mv") Ti 

■natunty m adolescence was formula^d (& t’es 197n Th 

identiaed career comoetenr*^ j 

contributing to career development (hfotherT 

and realism of career choices) ^ others being consistency 

teacher mvohedXamrf'carirJdur't PnoBram or a 

useful information for diagnosme indw^d provide 

Until Its validity and reliability hLrbeen l"='°P™antal needs 
be interpreted cautiously Users m nxrr fasnlts must 

for the concepts of caree?mX;^a"™;;-«- ^ 


USING EX.STI.OSCATES TO MBASUHEATriTUIIES 


There are many different r 

rifmssmm 

sr,i-:SE5iF"-=«^^ 

' o ‘"at how he or 



Scales to Measure Attitudes 


she feels about himself or herself affects the way he or she per 
forms in school (Tuckman and Bierman 1971) Moreover many 
school systems programs and teachers consider the enhancement 
of a student s self image to be one of the general goals of educa 
tion There are some students whose self image tends to be partic 
ularly negative— the disadvantaged or deprived (Tuckman 
1969) — in their case the improvement of self image becomes a par 
ticularly important objective of the educational process 

Attitudes like self image fall into the affective domain (dis 
cussed in Chapter 6) and measurement m this area is not as well 
developed nor as widespread in use as measurement in ^og^ive 
areas This is partly because of measurement difficulties (such as 
tendencies to fahe or respond on the basis o socia accent 

described in Chapter 8 and partly because “f/ j 

ance of objectives in the affective domain as ° , 

the schools (However acceptance of ™ 

mg a tendency to increase) Some appropriate measures are 

described below 

This measure of how students 

Tennessee Self coneeP. S^Ie Th. ^ 

comes in 

view themselves a Clinical and Research Form 

two forms a Counsehn^g Fom “ ^ut the latter yields more 
Both forms contain the same ttem^m 
scores Only the former will he descnbed her 

, I cnniams 100 Items (which appear in 

Content The scale ‘ presents the stu 

scrambled order in the test aame of which are illus 

dent with a self descripti 
trated below (by permission) 

. I sometimes do very bad thmg^ 

. I try to play fair with my friends 

. I am nefJr too tall nor too short 

: ISSIdwithmynioralbehavior 

auritten m a positive direction reflecting a 
Some of the items a^'e ^ negative direction indicating a 

positive self concept ® indent uses the following five point 
negative self concept The 

scale to respond to eac ^^ostly completely 

completely mostly P true 

false partly true 


423 



Measuring Interests, Attitudes, and Personality Orientation 


The classification of the 100 items into subgroups makes possible 
the many types of scores described below 

five types of scores, most impor- 
tes caL , T To, a! P Score .s 

f f • "’at ts. It reflects a student's over- 

all level of selfesteem It is based on the number of positive sclf- 

relationships"), and peonle who ',1 ' u my family 

nobody ^LelkmseTv^faV^fn^^^^^^^^ ("I nm a 

trustworthy"), often feel anximw rt j ^ could be more 
losing my mind"), and have litile ^1^0; 
selves Cl am no good at all from a. , confidence m them- 
In addition to the To al ^'nn^Pomt") 

vided for (I) idenUly ("r am a ha?/ff I ''’’"“I? ^^e pro 

fO” ("I feel good mi. i the >' Scl/sot^ae- 

games"), ( 4 ) Physical Set/ ' “'’'T ^ Poody 
( 5 ) Moral-Ethical Self ("I anf a m * it 1* “ healthy body"), 

(• I am as smart as iZmTo be") >‘‘r!onat Se, 

parents as well as I should") and fM o 'neat my 

along better with other people") ' ' “’’Bl’t to ge^ 

^ self criticism 'li-tile'f “4°' aeores obtained are { 1 ) 

"" ““"''"‘comp1eEa\hV:i? ITil 

Application The*:,, i 



Scales to Measure Attitudes 425 


likes and dislikes Students usually take less than 20 minutes to 
complete the test and the counselor less than 10 minutes to score 
It Its results can help teachers and counselors reach out to indi 
vidual students to help them overcome negative feelings toward 
themselves The Tennessee Self concept Scale is published by 
Counselor Recording and Tests This publisher also offers the Piers 
Hams Children's Self concept Scale (1969 edition) an 80 item 

inventory for use with students from grades 3 to 12 This scale is 

somewhat less clinical than the Tennessee and simpler in both 
sconng and interpretation (but, at the same time, provi es e 
detailed information) Items (responded to yes ,, 

with the way children feel about their (I) [4, 

tual and school status, (3) physical appearance and ^ 

anxiety, (5) popularity and W 

from a group of over 1 000 fourtn tnro E J ^ ^ 

vided This scale is recommended for researcn 
classroom screening device is also possi e 

,, .IT The Self appraisal Inventory ap 

Self appraisal entitled Measures of Self 

pears m a collection of „ ,n three versions a 

concept K-12 (Revised EditioiH « PP Level Ver 

Primary Level version for . t version for grades 7 

Sion foT grades 4^, and Pnma^ry Level 

and over All three are group a 


and over All three are group a ^ 

version is administered picture coded ) Un 

readers (The answer sheet f provides for the re 

hke the Tennessee j,v,dual scores for individual stu 

porting and interpretation o typically used only for the 

dents, the Self appraisal Inve ^ although individual 

reporting of group data (su useful for evaluating mstruc 

reporting may be done) evaluating individuals The Self 

tional experiences rather , j the Instructional Objec 

appraisal Inventory can be 
tives Exchange 

content The Self appraisal Inventoiy is written to fit the 

following four objectives ^^lf^„cepls m the peer dimen 

(1) students will (disagreement) with statements 

7 ‘"nbT“!i7vr(nSitive) perceptions of the self m 



Measuring Interests, Attitudes, and Personality Orientation 


social situations (eg, 'Other children are interested in me," 
‘ Other children are often mean to me’’), 

(2) students will display positive self concepts m the school dimen- 
sion by indicating agreement with statements that describe 
scholastic success or esteem of self in school and disagreement 
with statements that describe school failure or lack of achieve- 
ment (e g "School worlc is fairly easy for me," “I forget most 
ofwhatileam’). 


(3) students vnll display positive seltaneepts in the /nmily 
dimension by indicating agreement (disagreement) luth state 
Perceptions of self m 
of ilrf relationships or situations (e g , "1 do my share 
unr ^ “"'““e. I often get m trouble at home," "My family 
^ndemmnds me 1 feel that my family doesn't nsually trust 

Smg ‘agTOmem » Setterol by md. 

every happyXnT 


questions to™ hJ'chdhUdS'Somd “ written m the form ol 
ate Uvel version has Tydemnn^thrfd " "a 
;tems, each of which is m statemem ™rsion 62 

true or ' untrue ’ All three versionl responded to as 

four objectives (namely, peer Shool ? to measure the 

eachmayhesuhscored:rr.em^tre“:^d'a^;„\tsfar^ 

“Pnttng the 

adding ,0 tt the number of n,S,.l and 
th ‘aat-rete^-Uel, , ™ ''aaponded to as "un- 

tor the three forms ’’‘’“‘aa range from .73 to 87 

Application The 

n^ay buy the 


source book and 



Scales to Measure Attitudes 


427 


A second area of important attitudes is attitudes toward school, in 
terms of giving the teacher feedback about the positiveness of 
educational experiences for his or her students An instrument 
designed for this purpose is the School Sentiment Index which, 
like the Self*appraisal Inventory, comes in versions for primary, 
intermediate, and secondary levels It is distributed in uncopy- 
righted form by the Instructional Objectives Exchange 


Content. The SSI was developed to measure the following 
objectives 

( 1 ) students will indicate positive attitudes toward teachers by 
responding "jes" ("no") to statements reflecting positive (neg 

ative) aspects of teacher behavior in terms of 

a adequacy and fairness of instruction and grading (eg, I 
usually get the grade I deserve in a 

teachers Eue assignments that are loo difficult , 

b fairness in authority and effectiveness of control (eg. My 
' teacher treats me fairly," "My teacher bosses the children 

n. =^:^ion, — 

nlrm wratTh^avrtfsay,'' "Ifany of my leachers have 

rnTm,:”strd/or .nvolvemenl (or lach of them, in leaming 

r'nmrordTfflculfac’:" 

, —fnt"" ™ ^ ““ 

very much to do homework"), 

c extra ‘ ' ,ive attitudes toward the school 

(3) students will w expressing agreement (disagree- 

3', wUh sTatements relating positive (negative, student per- 

ceptions of the jehool has too many rules",. 

a bureaucracy (eg. . "Students have a voice in deter 
b school organisation 

mining how this sem order to wm an office m this school 
c traditions ’ ,j,e nght crowd",, and 

you’ve got to oe ,5 

d activities (e g . 


Attitudes toward 
School— The 
School 
Sentiment 
Index 



428 


Measuring Interests 


Attitudes, and Personality Orientation 


(4) students ™u mdteate posifte attitudes 

school by expressing agreement (disagreement) with state 

mentsdescnbingposime (negative) aspects of 

a. the openness and fairness of friendship patterns (e-g.. 

' School IS a good place for making fnends”), 
b fciendlmess (eg 'The other children m my class are not 
fnendly toward me ) . t« 

c social distance (eg. 'Other children bother me when Im 
trying to do my school work ),and 
d stratification (e g ‘ Older children often boss my fnends 
and me arotmd at mj school ') , 

(5) students wiU indicate positive attitudes toward school m gen 
eral by expressing agreement (disagreement) wnth statements 


describing the positive (negative) aspects of 
a holding power of the sdhool (eg "It is clear to me why I 
shouldn't drop out of school >, 

b being tn school vs tetnavmng home from school (e % , "I en 
joy learning in school more than learning on my own"), and 
c going to school (e g ' Each morning I look forward to com 
tng to school ) 


The pninary \eve\ version has 37 items m question form that 
are presented orally and responded to by either "yes" or "no " The 
intermediate and secondary level versions have 81 and 82 items 
respectively in statement form that are self administering and 
responded to respectively as "true" or "untrue” and "strongly 
agree '"agree' "disagree " and "strongly disagree " 


Scoring The SSI is easily hand scored Usually a single, total 
score is sufficient however, if desired, separate scores may he com- 
puted for each objective since the book contauiing the tests lists 
the item numbers that fit each objective for each form Test-retest 
reliability coefficients are reported for the Pnmary, Intermediate, 
and Advanced Uvel versions as 87. 83. and 49 respectively (the 
last being quite low) 


Application Webb et al (1966) suggest the use of a morale 
*" "’•““•Of attitudes toward school (m 
“"'i "f '"“'"'O’ aolwevemcnt) The SSI can serve 
AT program evaluation Tucltman 

antS f M4) for example, showed that students in 



Measurias Personality Orientation 429 


MEASURING PERSONALITY ORIENTATION 


In this section orientation can be taken to mean any disposition or 

predisposition a person has VIS a \is his or her needs values likes 

or dislikes When these orientations form a cluster the word per 
sonaUly will be used to label or describe it hence personality 
orientation , , 

With the increasing concern for the education of the whole 
person teachers need to be aware of individual differences and the 
development of character and \alues Some types o measuremen 
m the area of personality orientation are desenbed below (Hoxv 
ever since personality is a core part of the person teachers must 
exercise caution in measuring it )* 


The CPI IS a measure of a broad ^Tosuwe Aperts 

teristics (18 in all) associated with favorable and 

of human behavior rather than the pa g published by 
intended for persons 16 years of age and older It is published by 

Consulting Psychologists Press 

Content The CPI contains 480 items which fall into the fol 

lowing 18 scales 

Class I Measures aggressive per 

(1) DommMce ™ have leadership potential from those 

suasive independent self^onfidence 

who are ‘f'^MortCrf-disUngmshes those who are amb. 

(2) Capacity tor If seeking effective and broad m interests 

lions resoureefu “ stereotyped in thinking re- 

from sndawkivard in unfamiliar situations 

stricted m _j„„„,she 5 those who are outgoing com 

(3) Sociability (Sy> ongmal from those who are awk 

petitive ® ssuming passive and overly influenced 

ward submissive unassuming f 

by others ,sm_distmguishes those who are cleier 

(4) Social Presence t ebullient from those 

quick self restrained vacillating and unoriginal 

who are dehbera ^ee sharp 

(5) Self acceptance t > 

. a.sruss their interests m and use of personality 
• Teachers may want w psychologist and/or school administrator 

measurement win 


California 
Psychological 
Inventory (CPI) 



willed, self centered, and self-confident from those who are 
conservanve, conventional, narrow, and self abasing 

(6) Sense of We« being (Wb)— distinguishes those who are ener- 
getic, versatile, productive, and who value work from those 
who are unambitious, cautious, self defensive and constricted 

Class n Measures of Socialization, Maturity, and Responsibility 

(7) Responsibility (Re)— distinguishes those who are planful. in- 
dependent, dependable efficient, and moral from those who 
are lazy, spiteful, impulsive, and moody 

(8) Socialization (So) — distinguishes those who are honest, indus- 
trious, steady, conscientious, and conforming from those who 
are defensive, resentful, stubborn, deceitful, and given to 
excess 

(9) Self-Control (Sc) — distinguishes those who are calm, patient, 
self denying, deliberate, thorough, and honest from those who 
are impulsive, imtable, uninhibited, and hedonistic 

(10) Tolerance (To) — distinguishes those who are permissive, ac- 
cepting and nonjudgmenta! from those who are narrow, dis- 
trustful, and overly judgmental 

(!l) Cood Impression (Gi) — distinguishes those who are con 



Measuring PersonalIt> Onentatlon 431 


ccrned with and capable of creating a favorable impression 
from those who are cool, distant, and little concerned with 
Others 

(12) Commimahty (Cm)— distinguishes those who are dependable, 
moderate, stead}, and "normar’ from those who are change 
able, nervous, and have problems 

class III. Measures of Achievement Potential & Intellectual 
Efficiency 

(13) Adnevenict via Co, i/or, nance (Ac)-d.stingu,shes those vvho 
are cooperative, organized, sincere, persistent, and 
achievement from those vvho are stubborn, opinionated d.s 

are mature, self reliant, and have superior J^t and ab,l 
ity from those vvho are submissive, compliant, wary 

„ 3 , 

ltmr;:rerunt^-s, and laolv self discipline 
Class IV. Measuresoflntellectualandm.-^ 

(16) PsychoIo$ical Mwdedness ( y s,a,es of others from 

interested m and respomwe to the m 

those vvho are slow, delib ■ 3 ,,. informal, adapt 

(17) Flexibility (Fv)— distingui , , „ng foom those vvho are 

able, rebelhous, sarcastic, and 

methodical rigid ,nose who are appreciative, 

(18) Femininity'- ‘L .ympathetic from those vvho are 

patient, f "“^Xmus opportums.ic, and blunt 

hard headed, ambitious, pp "When I 

.elfdescnptive statements (eg. When 1 
The CPI Items are “ excitement" and "I am afraid of 
get bored I like to stir up , , either "true" or "false 
deep water")” that are ansvvere 

c, f standard scores on the 18 scales is avail 
Scoring. A profile of si students are not as high 

able Reliabilities per sea reasonably high 

as for achievement tests but still 

oncnally Pebhshed the idea of separating traits by 
I® Since this test ^^^gptable Manual for the CaUforma 

PhD copyogh, .956 Pub 



432 Measuring Interests Attitudes and Personality Orientation 


California Test 
of Personality 


Application The CPI has been fairly widely used and sub 
stantial literature on it is available In those cases where a broad 
picture of students personality is needed it will easily suffice It is 
usually beyond the scope of typical classroom use but might be a 
valuable counseling aid or a vehicle for promoting self awareness 
in conjunction with a program such as career education (More 
specific uses are suggested m the last section of this chapter ) 


CPI “ = cTf Jr' be confused with the 

anLIvise?.n ?Si^ " T ^ published in 1942 

.s P'^bhshed\;X.Sh“ettrarM^ " 

which fall mto*ecategoTOs of mpLsr''l”^d'^ 

social adjustment The perso„nl P^'^^epel adjustment and (2) 
c selfreLnce (mdependenrself A J 

sponsible) b sense of personaf Zor?b 5‘eble re 

reprded of being capable altraCve a Jd of?'' c '’I'"® 

future success) c sense nf f/ having faith m one s 

control over ones fate) d feelmeof bdo°”' ‘*®‘e™‘™»on 

friendship and cordiality gettinla/on '^‘'’‘"'"eing love 

drowmg tendencies (fantas"r„r wfn ‘ 

eemed) and / nervous syZ^tJ^T^ being self<on 

chronjc fatigue and so forth indijatitl jr 

“ “"cmrsLd 



Measuring Personality Orientation 


433 


adjustment, b social adjustment, and c total adjustment in the 
form of standard scores and percentile ranks The test takes about 
45 minutes to administer 


Application The focus of the California Test of Personality 
IS on adjustment It has different forms that may be used with 
students from kindergarten through college and gives a" mdica 
tion of their level of adjustment, both personal and social By ask 
mg students questions about their life and perceptions (eg, o 
children at school ask you to play games with them ) those with 
adjustment problems can be identified 


no" type personality mventones, choices are pre 

the Edwards “5“/^ ^“yfpl^rTfrom which the respondent most 
sented in equally * ability of each choice in the pair, 

choose one By ^juating the d ' J (See the sample item on 

the tendency to distort can be mmimizea i 

page 434 ) 

o TI, ,s EPPS scales are as follows ( 1 ) achievement 

Content The 15 EPPS ^ recognition, to solve 

(ach)— to do one s best, to so ^ ^ 2 ^ deference 

difficult problems, to do ‘*" 8 conform, to let others make de 
(de/)— to follow instructions (ord)— to be neat and 

cisions, to do what is expec arrange, (4) exhtbitton (ev/i) — 

organized, to plan before jf ” jq be the center of attention, 

to be witty and clever, to e what you want when 

to talk. (5) autonomy ^forni to be unconventional, (6) 

you want, to criticize, to n fnends, and share and partici 

affiltatton (aff) — to be alone, to form strong attachments, 

pate with them to avoid oei s ^rself, to observe judge, and 

(7) intraception {mt) o what others will do (8) succorance 
understand others, to pre j understood by others to 

C 5 Hc)-to be helped encouraged 

ot the publisher CTB/McGraw Hill Monterey 


nf the publisher CT 
Reproduced by by McGraw HiH Inc 

CA 93940 Copynght © 


Edwards 
Personal 
Reference 
Schedule (EPPS) 



Measuring Interests Attitudes and Personality Orientation 


receive help and affection (9) dominance (dom) — to be the leader, 
to make decisions, to influence, supervise and direct others, (10) 
abasement (aba)— lo feel guilty, to accept blame, to need to be 
punished to feel timid and inferior, to need to confess (11) 
iiurturance(mir)-~to help others, to be kind and sympathetic, to 
forgive, to be generous to show affection. (12) change (chg)— to 
experience nove ty experiment do new and different things! (13) 
endurance (end)— to keep at a job until it is finished, to work long 

Xt^rto s’, (M) he,erosexu 

al,ty ihel)-to go out with interact with, be seen as attractive bv 

SSSSSEH-’— 

trated befcw’’.- -=1- Pa.r is illus 

failure friends to encourage me when I meet w.th 

0theKfSe''lL‘'h ° °f 

to be taken care of by others) The scale (i e , wanting 

u or 6 ^ The respondent must choose eithe? 


of the need^s or'^S"M,ons'EXaw'’V^ 
a percentile rank using tables n a converted to 

results plotted on a pmXln ?hf'^ the 

predominance of each need with respectTtt oaX" 

provides a personality' plSth oTan''^''d”^d'^ measures, the EPPS 

menitortheehectivenelU^So;™’ -rL'^fedm 

UoS'N'c““York N y’aii'™ <^I>ynght 1953 bvTI, n 

k'Vc the Kuder E Th?E?|5‘* reserved ^ ^ ^he Psychological Corpora 
““"‘■“■’"'''■--.psatiye measure 



Measuring Personality Orientation 435 


The Scale of Values, which first appeared in 1931, attempts to cate 
gonze a student's preferred value orientation as a basis for increas 
mg self-awareness, providing guidance, or individualizing experi 
ences It is published by Houghton Mifflin 

Content. Like the EPPS the Scale of Values uses the forced 
choice format, asking the student to indicate t ^ ° 

group of choices he or she prefers , - marked 

he or she can weight preferences a 3 p 

preference, or a "a" versus a "I" for a slight 

musl'rnk plcLZnce for“eaA f 4 (mos®t 
value onentations 

orientation toward the scientific and the abstract 
orientation toveard business and finance 
nrientation toward beauty and the arts 
cncnlation toward other people group situations, 

"ituon® toward government and decision 


( 1 ) theoretical 

( 2 ) economic 

(3) aesthetic 

(4) social 


(5) political 

( 6 ) religious 


™S.on toward .he moral and spiritual 


101 retteioii:> w..— — 

f Part I (for which there are 2 answer 
Examples of an item from , j,ps 4 answer choices) ap 

choices) and one from P 

pear below ceremony (ecclesiastical or 

• When pen into office etc), are you more im 

pressed „,oeantry of the occasion itself, 

a by the color and pageanuy 

i by the influence and strong 

[choice a „ ,pi„n and had more than enough in 

• If you lived , „pld you prefer to 

come for your ne . , 5 , commercial and industrial 

a apply If 

development .cancement of the activities of local 

b contribute to uw 
religious groups 

.1., Scale of Values Allport Vemon Lindzey are 
« Sample items The Houghton Mifflm Compaq 

reproduced by pennisston 


Scale of Values^® 



436 Measuring Interests 


Attitudes mkJ Personality Orientation 


c give It tor the development ot scientific research m your 
locabty 

d give it to the Family Welfare Society 

[Choice a is economte b is tehgious c is theoretical d is 

social ] 

Scoring The Scale of Values is designed so that in 5 to 10 
minutes the student can compute his or her numencal score for 
each value orientation Correction figures are then used to re 
fleet the relation of the scores to normative choices the final 
scores are plotted on a Profile of Values 

Application The greatest value of the Scale is probably in 
enhancing self a\\areness and increasing a student s sensitivity to 
values In addition it is possible that tbe Scale of Values contnb 
uies to the process of affective instruction itself 


Embedded The EFT developed by Herman Witkm (1950) is a very different 
Figures kind of pcrsonaluy orientation test from those described previ 
Test (EFT) ously It utilizes complex perceptual figures with simpler percep 
tual figures embedded withm them 

patent The EFT consists of 24 complex figures and 10 sim 
pie figures (such as the ones shown in Figure 14 5 the complex 
figures are m color the simple figures appear as shown) 

COMPLEX FIGURE SIMPLE FIGURE 



nsm™ 1«3 Snmplo f\gmes Irom the Embedded Figures Test ■ 
■ Reproduced by pennlssloi. of Ifennan A Witkln 



Measuring Personality Onentation 437 


Box 14.1 


A COMPENDIUM OF TEST EVALUATIONS 

Tests in the affective and interpersonal areas can be difficult to locale 
since many are highly experimental and therefore scattered throughout 
the literature To make their location and subsequent evaluation easier, 
the Center for the Study of Evaluation in collaboration with Research for 
Better Schools, Inc , has published a compendium (1972), which includes 
tests In affective and interpersonal areas * In addition to listing a large 
number of lests along with their classiticahon. age 'ave'. and ^ 

compendium provides a comprehensive evatualion ot eac „ 

provides information about and a rating (good-tarr-poor) o ea 

( 1 ) vaiidity, ( 2 ) examinee appropriateness, (3) normed 

teaching LdbacX, ( 5 ) usability, ( 6 ) retest potential, and (7) ethical 

’’^^rexample ,s the °-ts«d%“w :drd:r 

ance and self-regard scales are both iiste . . ^3 gnj 

3A3 Satt-astaem Sd/f-/udsmjt Both are app 

over and both are supplied by Educ a sell-regard 

The self-acceptance scale Is judged scales receive equal rat- 

scale peer validity In the "tder ca^^ excellence, 

mgs, namely good ®“minee app P polenlial No 

poor teaching feedback, 9°“ ot either have been raised These 

comments about the ethical P measures 

ratings seem to be about the p 

^ edited by Hoepfner and others 

* Copies of the CSEMS ggee Center for the Study of 

may be obtained foyEducatioo, University of California 

Evaluation, Graduate Sen 


j 1 rtested individually) is shown the com 
Scoring. The and then the siniple figure alone 

plex figure alone for 15 c„ure is then removed and he or she 
for 10 seconds The ® figure The score is the number 

must locate it within ^ ^ correctly trace the outline of the 

of minutes it takes the j re, with an upper limit of 5 

simple figure on the complex hg 

minutes 

Application. ' a'l* (19™.”w62) have shown that 

with personality’ WitKin 



438 Measuring Interests Attitudes, and Personality Orientation 


people can be roughlj categorized into 'field independent and field 
dependent on the basis of it Field independent types can separate 
fi^re from ground while field dependent types cannot break apart 
a figure from its background Each of these two types also shows 
a sinking array of charactenstics that are consistent with their 
ability or inability to separate figure from ground On a compan 
ton perceptual test the Rod and Frame Test (RFT) for instance, 

the ‘he verticalness of 

the rod even when the frame is tilted than can field dependents 

FFT perhaps is the relation between scores on the 

can chai^ctensttcs F.eld mdependent 

^Wret / ^ '‘“P "“"S" “P^rt tn expenence they 

?.on even 

elohal .11 articulated vtyrfpT«,v’^„f,h''’“'^“; “ 

likelv to . 1 , their social wor d They are 

berton, 1952) with a tendf^nrv i, nonconformist (Pern 
than are field dependents ^ ^ achievement onented 


«SIN’G FUBUSI, ED AFFECTIVE hnvASURES 


Making students aware of ihpir 

onentattons may aid them PPrsonahty 

epmenl By he.ghtcning stud?nts“o^ tp .J™! 

“n play an instructional role in thT ^ ^ 
of feedback required for s™fV"h!^/“^= ‘he ktnd 

[naking The kinds of tests desrr.K.j career decision 

by classroom teachers at the ohapter can be used 

r^rSSS£'£SS 



Additional Information Sources 439 


The personal and interpersonal growth of students is an area 
of growing importance in school In this chapter we have identi 
fled and described tests that can be useful in stimulating and eval 
uating outcomes in that emerging area 


Additional Information Sources 

Beatty, W H (Ed ) Improving educational assessment and an mven 
ton of measures of affective behavior Wash D C Jissocititioti for 
Supervision and Curriculum Development NEA 
Bonjean, C M, H.Il, R J McLemore, S D Sociotogi^ — . 
ment An inventory of scales and indices San Francisco Chand 

Buro'^st tTEd°;rso...<.-y ^ 

The Gryphon Press, 1970 personality and motivation 

! ;,r :r;/— — » 

healtli measures Los Angeles Human 

tute, 1973 . oersonahty traits NY Holt 

Edwards, A L The measurement of persona y 

Rinehart,* Winston 1970 evaluations Tests of higher 

Hoepfner, R et al (Eds ) coc- ,„,erpersonal skills Los Angeles 
order cognitive affective, Box 14 I) 

Center for the Study o va toward school K-72 

Instructional Objectives c- .rf.ftional Objectives Exchange (Box 
vised ed ) Los Angeles ns 

24095), 1972 Measures of self concept K-12 (re 

Instructional Objectives ^ Objectives Exchange (Box 

vised ed ) Los Angeles n 

24095), 1972 y, and measurements tn child 

Johnson, O G & Bomman o Prancisco Jossey Bass Inc 1971 

devdopment d 'landj^^^ ImHdm Princeton NJ Educational 

Rosen P (Ed ) ^ ^ 

Testing Service ® vocational fitness (revised ed ) 

Super D E & Cntes 7 LI apy 

New York Harper * Ko« preschool and kindergarten 

Walker D K Socioemotiooa^^^^J^ josseyBass Inc 1973 

children A handbook pertinent research 

Wylie R C The sd/ of Nebraska Press 1961 
literature Lincoln NeD 



AUUua«, ani Personality Orientation 


SeJl'test of Proficiency 

On Items requiring recall ol spec, lie lest Wormaticn, you may reler 
to the text 

(1) The IcUovring questions refer to the five career-related tests nameW- 
Strong Vocational Interest Blank, Kuder General Interest Survey, 
Ohio vocational Interest Survey (OVIS), Sell directed Search (SDS) 
and Career Maturity Inventory (CMI) Indicate which test you would 
use In each instance 

a To help students relate their personality orientation to occupations 
for which they are best suited 

b To help students identify the pattern or profile of their interests in 
areas such as mechanical, literary, artistic, scientific, and so on 
c To help students determine the category of occupations within the 
Dicl/onary cf Occupational Titles toward which their interests 
dispose them 

d To help students discover the occupational group with which the 
pattern of their preferences is most similar 
e To help students become aware of their attitudes toward and 
competence for making a career decision 


(2) The Strong Kuder, and OVIS represent different ways of measuring 
a student s interests As such, they differ on a number of characteris- 
tics. properties, or dimensions on which most tests generaWy <J»fi6'' 
(such as number of Hems) as well as on properties that differentiate 
specifically between interest Inventories Most important among these 
differences are those in scoring and interpretation Describe how the 
three tests differ on these two Important properties 

(3) The following questions refer to tests of attitudes and self-concept, 
'S'Lcife, ^’r/fi-up p raisdi 'nrveiftoryi 
and School Sentiment Index Indicate which one of the three you 
would use in each instance 

a To discover « the students m your class share positive feelings 
about their school experience 

b To determine whether an individual student likes himself or herself 
and the nature of these feelings 

c Tq Imq oul wliettier a class or group of students has a positive or 
negative Image ol themselves 


14) It is important to be able to distinguish between attitudes toward sell 
and toward sohool Below Is a tlsl ot labels. Items, descriptions, and 
tontcnl calegorTcs Mark sed nest to those which reter to attitudes 



Self test of Proficiency 441 


toward self and school next to 
school 

a “People like me " 
b “My classmates are smarter 
than I ’’ 

c "Homework should be 
abolished “ 

d “My family understands me" 


e that refer to attitudes toward 

e “My teacher IS kind ’’ 
f ‘lam good looking ” 
g ‘ It's hard to get to school 
on time “ 
h "I like to read ” 


(5) The following items refer to the California Psychological 

(CPI), cal, lorn, a Test of Personahly (OTP), ; 

toward orientations such as the religious, 


«nectrum of personality charac- 
b To classify s'^daPts on a b oad p 
teristics associated with positive asp 

as sociability. seIf<ontrol, and llexM IV 

' wa7rpe"-vo world or a .ore d.llerepbalad, indiv.da- 

e XToo; . be extend .0 wb.ob s.oden.s o, all apes are exper.enc- 

ing personal and social adfusimeni f^r^nnalitv 

think an item measures personality. 

(6) Below are a list of items y ,, „ae!ls. an W, if values, a V 

put a P next to It " a,’ home 

a I have a good sens ^ , „„partake 

b I am organized in act science 

c 1 like to read ^ gantle and patient person 

d I consider myself to by others 

e I often find It necessary te be nep 

t I find that 1 have lost my app 

rnr affective measurement ,n the classroom 

(7) List three --fo = and indicate how 

(8) Select any one of 7® *® room 
it might be used mthe classro 



chapter fifteen /Getting 
the Most 
from the School 
Testing Program 


OBJECTIVES 1 Describe the tesi item file as a strategy for im 
proving teacher built tests 

2 Describe the individual applications of test data 
to include a monitoring and certifying student 
progress b diagnosing individual strengths and 
weaknesses c prescnbuig instructional expen 
ences and d providing student feedback 

3 Describe and demonstrate the classroom applica 
tions of test data to improve instructional pro- 
cedures 

4 Describe the program and system applications of 
test data to determine the proficiency of the sys 
tern through the use of a system input measures 
b system output measures and c system process 
measures 


A TEACHER-BUILT TESTING PROGRAM: THE TEST 


Since teacher-built tests constitute a major portion of the school 
testing program, obviously strategies for improving teacher test- 
ing would be desirable: one major strategy is the development of 
a test item file, A test item file is a collection of test items each of 
which has been shown to possess the qualities of appropriateness, 
validity, reliability, interpretability, and usability ^ 

used. A curriculum or a textbook is not use /j 

carded. Why,. then, shouldatestbe-^^^^^^^^ 

some extent, in reusing a test secuniy y y 
problem that can be solved. See below.) Heine hiehlv 

If we consider the typical teacher- ui However limited 
appropriate and reasonably interpretable 

reliab|ty and validity and 

what the reverse, then the best strate©^ u^Kilitv and validity and 
ground: teacher tests with and infer^rem- 

standardized tests with the prescriptive or cri- 

bility. Testing "rUs^scribed in Chapter 13), 

terion-referenced achievement t ( ^jegrer interpretability 
which has greater 

than Its precursors, but litt^ teacher tests are typi- 

teacher testing. jjscaVded, and test improvement de- 

cally used once and then 

mands test reuse. Teachers prepare test items 

A test item Sustrated in Chapter 3) 

to fit their instructional ^ tests.^ Test results are then 

and use these items m terms of the five criteria 

used to evaluate the are discarded: good items 

as set forth in Chapters ' .-f,ji to put item along with the 

are retained. (It ^ card and keep it in an index card 

objective it measures on an ^ proficiency on a par- 

fili) Each time the teacher wants ^ ^ 

ticular set of The date that an item is used should 

taking items from the ' j gy mixing up items, adding new 
appear on the back o reusing items in successive 

ones every testing ’ ^Ijjjgjjjjnimized. 

testings, the security curriculum resources available to 

When you consiue draining teachers receive in utilizing 
teachers and the amount ot tra g 


leaciicis ini'* — 

or “Hi" “ 

1 Testing orevatotioncompnw^ 
as a source of items. 


443 



444 Getting the Most from the School Testing Program 


these resources and then consider how few resources if any, are 
made available to assist teachers in testing the difference is con 
siderable Teachers are encouraged to develop their own resources 
by developing individual test item Hies and by pooling their test 
Item files on a school grade level or even district basis 

Teachers committees should also be formed to examine and 
review published achievement tests and identify those that are 
most appropriate for a particular school grade level or district 


OVERALL FUNCTIONS OF TESTING 


rcCr'sT"' solving shills 

and behavior (Chapter 7 ) Performance competencies 

(5) standardized testing oI mtelhgence or mental abilities (Chap- 

13) and study skills (Chapter 
(7) standardized testiiKr of 

dents ‘nterests and attitudes^tow^^ri of stu 

14) school and self (Chapter 

“re “Iso uumeroL^burconsistT P™ applied 

ypes (1) „;X;r„rf2? f •''= fP'lowingThme 

uona fv “PPfcoiXs Svidua “PP'^mious 

tionally received the most att™, '.“PP'^^'rens have trad, 



Individual Applications of Test Data 445 



The Role of Testing In Program Improvement 


The purpose ot rejt ^pp—s o1 

sr^=rh:srhtYtt,pro,ra..^;h^ 

the students Since “‘'""f “I teacher they will be descnbed 
tional, and are more the role of the teacner, tn y 

first 

individual applications of test data 
U A ,rMs in which test data can be 

There are basically four broad ^ These are (I) monitonngand 
applied to individual studen diaenosing mdividual strengths 
certifying a'url™* bing instruction, and (4) providing 

and weaknesses, (3) Presor “ . j 

student feedback Each will be describe 

„ Ifnowledce of a Monitoring and 

Teachers have the responsibility to prescribe Certifying 

student's continuing Progr^s student learning are Student Progress 

instruction The most j ^ed achievement tests * 

teacher built tests and stanaar 

^ , The success with which teachers tests 
Teacher-built Tests , ^5 on the extent to which these 

evaluate student progr«s depen^^ these 

tests meet the Bve 0^'“^ °“‘^^^^„bed, they will not be covered 
criteria have been ‘borough y b35,^d ,be quality of 

again However “/“nno “udent progress will vary as a fimction 

decisions about individu decisions are based 

of the quality of the tests 

r-h nonlest evidence as homework and obser 
/•iKars also use such n m ludtnne individual stu 

* Of course teachers aisu^^^^^^^ t 
vation of classroorn p 
dent learning 




446 Getting the Most from the School Testing Program 




Individual Applications of Test Data 447 


It IS recommended that teachers do their recordkeeping of 
test performance m terms of objectnes and that testing be under 
taken for the degree of proficiency on each ’ r 

shows a sample recordkeeping system for a c ass o obiec 

the students are listed down the side of the f 

tives across the top There is. then, one box for each ^tude'it s P 
formance on each objectne The f demon 

date on which adequate proficiency on tha ° ^ attained ^ 

strated or a number indicating degree o p ^vould be 

Where adequate proficiency was not attained the box would 

ms mcordkeepmg procedure is^^^ 

unit of analysis rather than ^ed to measure degree 

the like Since the teachers tests are fles g suion laid 

of proficiency on ‘al.es sense to utilize the objective 

out in this book is followed). and evaluatmg as well 

as the unit of analysis or “P;"fo\"e cognitive m nature 

All of the objectnes listed do n i,,i,avior (the affectne do 
Objectives describing attitudes co^it.ve objec 

mam) can be included as wet acquisition of knowledge 

tives. all do not have to dea analysis and syn 

Objectives dealing with cogniti Chapter 3 for a desenp 

thesis are also quite appr°P™* ‘abjectives and tests designed to 
tion of the relationship between ooj 

measure them) „riifvinE student proficiency presents 

This procedure for cert y g^ individual differences in per 
three possible ways for dealing Uirough instruction 

formance One is to a‘lo'V *'““0 capability will be reflected m 
at their own rate so ‘'’at md ^ has acquired 

terms of the number of obJ«i proficiency based on 

proficiency A second is ° the objective that the student 

the percentage of items meas dfstinguish between objectives in 
answers correctly * ‘h’‘dj j°g;cult'y with more capable students 
terms of their comp ‘y objectives 

concentrating on the 

XT referenced standardized tests are typ 
Published Tests ° jtudent progress than teacher tests 
ically of less ”,®™nform as closely to a teacher’s objectives as 

recorded as the number of 
. For example degree of P’f ““5„e ?hat were passed divided by the 
flemrused to measuw ^ g „eie passed 

number used such as W 



(nrftviduJi Cumulaiivo Record 
of performance on the 
Tests of Academic Progress 



Reproduced by permission of the Houghton Miflhn Company, copynght © 1964 


Indhldual Applications of Test Data 


449 


do his or her own tests (However, they can be used to some extent 
m this capacity by comparing a student's standard scores wer 
time, using a form like that shown in Figure 15 3 ) Published 
criterion referenced achievement tests are written in terns o spe 
cific objectives and hence may be more useful than their norm 
referenced counterparts for e\ aluating or certifying progress ma e 

by the student . , > t 

To increase the appropriateness of a standardized achieve 
ment test, a teacher would have to go through it and “"'T 

those Items that relate to his or her specific objectives 
tion, this procedure however, would negate ^ 

and hence male the scores difficult to inte^re i based 

ach.evement tests are written for national use. t^hj 
on a set of objectives that is ^^bile standardized 

specifically appropriate for program, this 

tests have apphcabihty Part of ‘ monitonng and evalu 

applicability is only minimal m the objectives of 

ating individual student of objectives can be 

instruction However, item screemng m tetms o 

done on published criterion 4 Thus these tests 

preted m terms of objectives learning progress using the 

have potential for evaluating Figure 15 2 (Refer also to 

kind of recordkeeping scheme criterion 

the report forms shown m Figures 13 8 

referenced reporting ) „„,tuating a student's progress rela 

When your interest ‘ sjuJents rather than relative to 

live to the performance “ , , achievement test A report form 

objectives thenuseastan grade equivalence scores 

such as appears m Figure ,-.„dent s performance to that of 

which can be used to compare a 
his or her grade males 

4 4 ,.c testing IS not to identify students' levels 

The purpose of diagnosnc ^^e^^^^^, objectives but to identify 

of proficiency on turning performance, and abilities This 

more general levels o ^vith the skills and competencies that 
kind of testing is u,red to advance in the learning se 

students need to . . gre often hard to identify diagnostic 

quenee Because prerequisFc^ 

, made certain slandardized lasts interpretable 
4 Some test P“'='‘?l”ced basis (sec Box 13 1) 

on a entenonreferenc 


Diagnosing 
Individual 
Strengths 
and ^Veaknesses 



Getting the Most from the School Testing Program 


tests must either a sursey a wide range of such skills and compc 
tencies (thereby having the likelihood of covering all prerequi 
sites) or else b measure specific objectiies on which proficiency is 
required before the student can mote on to subsequent objectives 
in the sequence •’ 

basic*«r,^i '■‘==‘•'"8 and arithmetic are the 

strenpih P™'"'*® information about general 

teacher can detect ^ ^3 5) the 

over a child s performance her° before their influence 

Pic lAeCharlfe BZrin plriV;^^^^^^ P”” 

grade level in all test performing at or abo\e 

science If this pattern contlnSes'^rb'T'* 

begin to become hmiled by his dtfficuhv I' * reading ability may 
■ngs Certainly tn social sc, enre he1Z nVeVr’“'""® 
this test information Miss Jones rbi , ^ Without 

not have known of these two areas d/hTw would perhaps 

perceived them the test scores would she already 

dence in her judgment ^ increase her confi 

depend instead "pon^pubhshTd oner'A'h"''* ‘‘'“8"Pstic tests but 
particular can be helpful lo teacZ test results in 

eemphshment relative ,0 ability fehauslr'"''"® °f pp 

sts^'SZd toZver a P^W'she" cnttmn 

t‘'es and (2) a te« Ik , measures Derfnrl^ cntenon 

Were co\ered P i performan on objec 

considered etvl’" ’"r"’' goZ Zr that 

sequence and finding' a°Z**'"® Purposes By lookm 

may find the has., r ““s score on an nb, . ® ™ the 

Ir^euTLTfi::^" '^"-&ad : 

tforms such as appear m 



I 


STASFOnDAililcvnn iJTr- t 

J«*JI ! IJ.« 

T*<icher MISS L J0‘ SS 
Vhool S-ILTH tLE,VF TAPY 
SrM*-" .'"STERMT-p 


INPIVIOUAl FtCO’^p 
Brown Charlie 


10/7J 

Oracle 3 



Reodinfl _ 

no Co<npf«heR»o'L 

Ward Sludy Sfcilk 

M oih Concept* 

/Aoth CompwiQ*"*" _ 

MathApp»»co*’0"* 

SoeINng 

Lanquoga 

^aeiqlS<i«n <« 

[Ttren.ng 



3S5/4421139 




ofHE» euni P*T» 

08 10 " 


lESI INFOtMATION 

iTi*' ” , - 

_« •) A GRADE 3.2 

S.o-'o«l El'l 1 J GR 3 S..6«4(. od 

0«un~" ELEMENT 1 000-0000-000 („ «» 

chievement issi 

)nes' Class pomneht © 1973 

Reproduced from P'™-'”' 

Harcourt Brace 
publisher 



4S2 


Getting the Most from the School Testing Program 


Figures 13 9 and 13 10 are used to report results on criterion 
referenced tests )* 


In general broad deficiencies or strengths can be diagnosed 
by means of published norm referenced tests v,hile specific deft 

AaTermnd r “®'"E criterion referenced tests 

that extend reasonably far back in a learning sequence • 


Presenting 

Instruction 


Icr,brr*“e '5 n°< fixed the teacher may pre 

a l£ge paf or,h™‘^“’™' t°r students In most eases 

xequenee of .nstmerorXTd\"rb: 

progress may be varied ‘ ™ " “ student can 

Brown for examp etc, ted ‘'’= “se of Charbe 

uce 15 4) hts weakness m word T ,‘^n ® '’“cd on Fig 

standardised test scores suggests that thi "t 

with vocabulary cards or do Sons ‘ ^ 

With his reading His teacher could uk conjunction 

order to pinpoint areas in which addiim^ 1 ®"°^*'° ’'“‘*'"8 *cst in 
helpful additional mstniction would be 

•“Cher tffitsTnt’on'iMn™°-™ he based on the results 



St™. — j ““jeciives on which he :V ,''‘““uuiar student 




Winston ChufchlH was 


atso an underachiever 


ii-.nt source of feedback for students, 
Although tests are an ,^3, , ells only their overalllevel 

merely providing them m demoralizing than helpful The most 
of performance may be m objectives 

helpful feedback should b ,3^0 of their mistakes Test feed 

they failed to master an information to help him or 

back should provide ® nerformance, mere scores can be 
her improve subsequen Witness the standardized test 

more of a frustration t an ^ similar version of which was 

score report shown m ^ third grader How will the mfor 

sent home with Charlie r , jiglp Charlie’ Since it doesn t 
mation on the individual re j,e, u may produce either 

tell him exactly ® ^ense of self satisfaction (perhaps the 

frustration or a compe i j-gjatively high) 

latter because his , ,est results to serve a helpful feed 

In order for achie teacher must be able to point out 

back function for a stuaen ■ 3 3, jo 

to that student where m ,5 available (often not 

this unless information at tm 


Providing 

Student 

Feedback 



454 Getting the Most from the School Testing Program 


the case with standardized test results) ' Hence standardized test 
results have the potenttal to assist teachers and thereby indirectly 
assist students (through prescriptive instruction) but not, under 
most circumstances assist students directly Therefore the maior 
feedT f “ teacher built tests To maximize the 

feedback value of such tests teachers should do the following 

(1) Give a pretest (le the endofumt test itself or an alternate 
form of It prior to the instruction) If pretesting is not viable 

fcomTalon 'hcmselees not ,ust the scores The 

test sc^rrre^T shortcomings as most standardized 

peir„reZX:t.“s““"“’ 

and what ought to be vour m between what is 

should tell them what ought to be 

en“;'?„:~ “srtrbr 

back ,n a constructive 1' ^,h 1"' “"“P' 

themselves against it This usuallv me 
least a second chance to pass the . ™ at 

jectmg the philosophy that tests shnoM “ 

™P™em.hertha„s,mp,y.„“Setrp:f“^ 


classroom applications of test data 


— SHiSS'"-**’ A'W r-"" 



Classroom Applications of Test Data 


455 


Consider the sample rosier in Figure 15 2 While il was constructed 
originally to record and c\a!uate individual student performance 
It can also help a teacher to evaluate his or her own performance 
The last row of the roster presents the results on each objective 
accumulated across all the students in the class By esamining the 
accumulatiie results on each objective the teacher can etemine 
those objectives on which additional or dilferent instructional pro 

eedures are needed Suppose for example that approximately 80 /. 

of the students demonstrate adequate proficiency on “ 

the objectives On these five only 50 «. of the class at most perform 
at an acceptable level The teacher must first “n™ne the obje 
ttves themLlves and the test .terns used - 

sure they were not provided for these five 

:;TeenveTv:L";n^ 

of instruction student performance taken mdi 

The important PO^US tha .^dividual per 

Mdually reflects upon ,o help tell the teacher 

formances can considered ^ ^ working as well as it 

empirically whether his or her instruction 

should be 


Using Results 
of Teacher built 
Tests 


. j ,-ri scores to help evaluate class 
To be able to use standard! foj- the classroom as a 

room instruction they mus summary of scores is shown 

unit For illustrative third grade class Based on this 

m Figure 15 5 for Miss ^ appear to be doing well m all 

report Miss Jones third g graders 

tested areas In a gtanmes The weakest areas are 

performing m the hig each with 37% performing 

vocabulary and math these scores Miss Jones should 

below average On the ^jth those small groups of 10 

consider working more , same ones) who are performing 
students each (not math computation and spending 

below average in vocabulary an 

less time teaching spe gfgrenced tests can be used quite conve 
Published criterio classroom achievement monitoring sys 
niently as the ° .-rmtive Mathematics Inventory for exam 

tern Level A of the Presc 

pie has 107 items t-ei four segments He or 

all the objectives mea subtests (m this example of 26 

she can divide the 


Using Results of 

Published 

Tests 






Classroom Applications of Test Data 457 


27, 27, and 27 items respectively), and administer them as shown 
in Figure 15.6. The class is divided into four equal groups, with 
each group taking one-quarter of the lest during each testing time. 
At the first testing, for example, the group that takes items 1-26 
will be taking an immediate posttest following instruction on these 
objectives. The other three groups will provide pretest data. By 
the second testing, the group taking items 1-26 will be taking a 
retention test. Now the second group of items (27-53) will con 
stitute the immediate posttest The remaining two groups of 
items, as yet untaught, still represent pretests. By the third testing 
time, only the last group of items is being pretested and by the 
fourth testing time, three of the four groups are being tested for 

retention^ each student takes each test item oniy once, under 

this system the teacher has a basis for 
quate learning is taking place, /t is not a sys i 




testing 
time 1 


testing 
time 2 


testing 
time 3 


testing 
time 4 

Student 

Group 

1 


Items 

1-26 

IPT 


items 

81-107 

PrT 


Items 

27-53 

ReT 


Items 

54-80 

ReT 

Student 

Group 

2 

o 

> 

<V fg 

items 

27-53 

PrT 

s 

> 

o ^ 

Q) <0 

Items 

54-80 

PrT 

V o 

_.D?- 

Items 

1-26 

ReT 

0) 

> 

o r-. 
w o 

— .o'V — 

Items 

81-107 

IPT 

Student 

Group 

3 

- ncf — 
O A 

u 

<v 

h- 

Items 

54-80 

PrT 

■g 

<0 

a> 

V- 

Items 

27-53 

IPT 

£ 

U 

a> 

1- 

Items 

81-107 

PrT 


Items 

1-26 

ReT 

Student 

Group 

4 


Items 

81-107 

PrT 


items 

1-26 

ReT 


Hems 

54-80 

IPT 


Items 

27-53 

ReT 


IPT =c immediate posttest 

PrT = pretest 
ReT = retention test 

ASyslentlorMonilonogAoOieveoien,. 



Mltog th« Most trom the School Teettag Program 

d.v,dml fcrtonmnce but a system Jcr 
the earns from pretest to immediate posttest do not reiiec 
quafe proficiency on all the objectives taught ^ment, th 
Lpplementary or addtltonal instruct.on can The 

goal IS for all students to demonstrate acceptable levels oF pro 
Iciency on all the ohjectn.es measured by the 107 items over the 
course of the semester or school year 


Box 151 


TESTING: FOB WHOM AND WHATT 

There have bean some recent Instances of opposition from teacher 
organizations to testing programs The professional organizations claim 
that the results of the testing programs vrill be used to evaluate teachers 
and that such evaluation is outside the terms of their contract The latter 
point Is usually indisputable, teachers have a right to negotiate the terms 
of theW employment, and evaluation as a factor In job security la clearly 
one of the most important of these terms Formally trying to change these 
terms through collective bargaining to include the application of clasS' 
room test data for the task of teacher evaluation may be Imminent in public 
education 

It 13 the first point the intent of these testing programs, that Is the 
more critical at the moment Those carrying out the testing program are 
typically interested in program or system evaluation and/or in providing 
teachers with useful feedback rather than evaluating them Teachers are 
suspicious that such data may be otherwise used by their detractors and 
are right in protecting themselves from this unfair possibility it must be 
pointed out that poor performance by students taken as a classroom 
group does not necessarily indicate poor leaching, poor performance also 
reflects on facilities budgets administrative support, and the character- 
istics of the students themselves However, poor teaching does represent 
one ot the potentiat causes of poor student performance (90% of school 
operating budgets is spent on teachers’ salaries) Certainly school systems 
would benefit if teachers were responsive to collective student perform- 
ance data as a source of feedback and basis for self improvement and 
administrators and school boards were willing to allow teachers some 
margin for self improvement rather than the two sides being in conflict 
over the testing Issue 



459 


PROGRAM AND SYSTEM APPLICATIONS OF TEST DATA 

For purposes of measurement it is convenient to separate educa 
tional variables into input, process, and output Measurable educa 
tional inputs are the gi\ens of an educational sys em ° 
budget, facilities, community needs and the 
students and teachers bring with them into the sc oo 
students input would include pnor learning and ^ 

experience Measurable educational processes zre . ^ 

chLcteristics of the system 

teachers the administering behavior o a "J'"’ measur 

operafon of .he .ns.ruConal 

able educational outputs f attitudes and behaviors 

ment These are learner achieveme a^H processes of 

that when taken together reflect upon e -j 

the school These relationships ® * J instruments to measure 
It IS possible to use One „ray to measure input 

school inputs, processes and o ,L static qualities of the schools 
would be to measure in Monday morning as the stu 

or what they look like at 8 00 a capacities needs 

dents and staff enter the >>uil^J'n8 experienced is the supenn 
and expectations of the teacn cchools’ How does the com 

tendent’ How well programs are available in the 

munity feel about education school budget’ 

schools for the students’ What is the 


„ „.c nutoat and nett Relationship 

Examples of Input Proce 


PROCESS 


input 


Teacher Style 
and Technique 

Available Monies 


Facilities 

Characteristics of 

Students 

Teachers 

Administrators 

Needs 


Administrator 

Style and 
Technique 


Program 

Operation 

Programs 




Figure 15 7 


OUTPUT 


Student 

Achievement 

Attitudes 

Behavior 






Getting .he Mo« from to School TosaogfroS™r> 



li $ nice to see e cheerful hesdlme lot a change ” 


What abilities do the students possess’ Hosv much have they al- 
ready learned’ As education proceeds, input is constantly changing 
Activities that go on in the school constitute process The way 
that teachers handle disciplinary problems, the amount of individ 
ual attention students receive, the extent to which kits and other 
learning devices are used, the extent to which teachers are provided 
with suitable and constructive supervision, the using of math work- 
books, the taking of nature walks — ^all these are manifestations of 
educational process 

The chief focus of educational measurement is on output In U 
very real sense, the "products” of education are the students, thus 
U IS important to measure their collective acquisition of skills, 
knowledges, attitudes, and behaviors It is important, too, to meas 
ure the individual acquisition of these skills, knowledges, attitudes, 
and behaviors in order to evaluate, facilitate, and certify that 
growth and progress Thus we can distinguish for purposes of edu 
cational measurement individual effects and group effects (the 
latter being the sum of the former and thus representing ten 
dencies) on school input, process, and output (see Figure 15 8) 



461 


Program and Sjslem Applications of Test Data 


Some Examples of Polenlially Measurable Dimensions ol the Figure 15 8 
Schools. 



INDIVIDUAL 

GROUP 

INPUT 

How well-trained is teacher 

X? 

What Is administrator Y's 
attitudes toward pupil 
control? 

How bright is Johnny? 

What IS the average amount 
of training of teachers in the 
system’ 

What do the administrators 
believe is the best way to 
control students? 

How bright are the 1st 
graders in the school? 

PROCESS 

Does teacher X use individ- 
ually designed workbooks'’ 

Does administrator Y have 
an open door policy? 

Are students in teacher Z s 
class permitted to choose 
their own lasks^ 

How many teachers use 
individually designed work- 
books’ 

What IS the policy of the 
administrative staff’ 

In how many classes may 
students select their own 
tasks’ 

OUTPUT 

dreading 

give Jane the most trouble’ 
How does Emilleel about 

the rights ol women? 

How well-mennered is 

Ethel toward adults in the 
school’’ 

How well can the 2nd 
graders in the District read? 

How do children in the 
school feel about women's 
rights? 

How well-behaved are the 
students? 


, one way of doing something is better 

While It ts difficult to be ^„,,jerable conHdence tn specifying, 
than another, we “n tavej parforntance is better than ow 

for example, that ^'8^ reau, ®ords, it ts caster to attach posit.ye 

reading to processes People may not be able to 

values to outcomes than hey can agree 

agree on ‘be best way t^^^^ ^ education Thus, 

that improving their processes are appropriate for 

educational °V^P'liuating educational effects 
measuring and ev 


Measuring 

Educational 

Output 



462 Getting the Most from the School Testing Program 


A second reason for focusing on outputs as measures of edu 
cational effectiveness is that we are far more adept at measuring 
output than we are at measuring process (that is we are much 
more adept at measuring how students perform than how systems 
pertorm) Also we are an output-onented society with a tendency 
to t alue our processes only if we also value their products In the 

work""' ™ ’’y 

mg tL1choo^'^^°“"’“'^ ” examm 

dents r.heL basluwf “‘‘f The performance of stn 
ardized test results such as measured by examining stand 

average South Elerma™ ^ On the 

appear in Figure 1591 ools third graders (whose results 

all tests A look at the 

\ocabuIary to be the weakest -.r stanme column reveals 

pcrfomungbelou ave^re 

helpful 10 mTe°l“ompam^^^^ >5 9 it is 

improved in any area from last Year to of students 

Assume that the school was rauJrl Toar’ Has it worsened’ 

manes workbook senes that ^ new mathe 

lest results could be used to answer Achievement 

Hate these third grade students following questions (1) 
nonal skills during the course of oomputa 

materials were use’ (2) Ire ^ “Penmental 

huZ“'a‘‘‘''‘^"™'™*bU'^P y^i-V Binders all of 
ih r1 ® nf vehom used 1 ™'"B '’='‘ei' lhan last year s 

Ihird graders tn South Elemen, ^i-hbook’ (3) Are the 

uorkbook perfonumg T ?" “f whom used the new 

cJiamincd ac-m. “ ^ standardized >.nk 'vork 

“<^ross time across districts 

(companng this vea?! Progress of a class) here) 

s>stcm or nroffr^^ *c^hcr tests can also k ^ P^S^am effects 
‘I'ms and h''? ^mmed^conectw ^ roneetton of 


on each'^nk"*' P^^'^ontage < 
each objective over tf 












464 Getting the Most from the School Testing Program 


course of the school year Thus, if the teacher uses Figure 15 2 to 
momtor the acquisition of individual student proficiency (for pur- 
poses of facilitating individual learning) and then sums up the 
results across students to monitor the effectiveness of classroom 

tor M‘'enOre°s sum Up the classroom results 

for an entire school grade level or school distnct to arrive at an 
appraisal of system effects m the basic subject areas 

consfdeTa toad^s?? a ^astc subjects 


Box 15 2 


evaluation 

been a part of thniraiesy orptooram has long 

programs are Intended t? be exBmnT.°““ ’’“'’'''b'P'IS' when such 
lenerel funds Perhaps soh^ dS'? ""5 hy state and 

« " 11 represented an exemplary pronraL T! 

"en 'he (ollowmg resources ma^beZptuI ” 

('I 'esIingexperKslonstatl 
( 2 ) consultants 

(41 remtr eeb 

rence sources and libraries to 

bSl™-- ” 

P 'nl^pS'rST 

school grade 

pul with respect to each' ‘''"®™'ning goals and ml “^'^'^"1=0 will 

Sian Should be aL?f°°°' '“oaMeLchers ' * ""’“"“""S ”“1 

eaked t„ pauicipjte In delerm mo “Pmui'elralive 

'""erm.ning and measuring goals 



Program and System Applications of Test Data 465 


learning areas, all of which are an important part of growing up 
but not all of which are covered in the schools ’ While it has often 
been emphasized that reading and mathematics are the most 
critical areas of school responsibility it is important to acknowl 
edge the many areas of deielopment at which schooling is and 
should be aimed The list m Figure 15 10 represents an attempt at 
a broad and comprehensive list of such goals 

Ideally, each school district should generate its own list o 
goals (with teachers parents ose 

and consider the attainment of those go^s as e ^ , 

of the group school testing ’’^jfesTa/bf LmbLd 

gram existing achie\ement and attitude rtnt-nn<;e Cer 

and supplemented by new ones ° ,a might be use 

tamly, a number of the tests described f hapte « -rngM te 

ful for this purpose The ZZl dl^trmts the 

instruments is to make ^ expecting a single instru 

major disadvantage "of tmmtf sZ's may be 

ment to fit a variety but would have to be con 

used to supplement published tests oui 

sidered across classes ® 


» he out into perspective by con 
The matter of f ["'’“beve discussion of output To a large 

sidering it in relation to the measures are the same except 

extent, educational input a ^ jgquence and out 

that input measurement school s proficiency can be 

put measurement at the con . ^^^^j^ce at one point in time but 
based not on its final leve ® output Input measurement as 

on the degree of go'”,'™? , "IfL.nation of proficiency can serve as 
an essential part of the ^ which gams are to be evaluated 
the performance basehn ^ in addition 

that is the basis for the d reflect their individual financial 

if learning gams are adjusw ,^^b,le this formulation 

cost a district can r^ge educational miserliness it may 

should not be used to b„ut their best allocation of money 

help districts make decisi 

sach goals do not belong in the schools 
. Indeed some people »">> SVsS mmlje the comnrnmty m goal set 

“h°aspSss.bIe_a^f“p's-“^^^^^ 

tmg but even equaUy high appropnateness in each 

lev can be shown 


The use — 
unless they can 
district used 


Measuring 

Educational 

Input 



Figure 15.10 


Ust of Educational Goats.* 


LEAflM HOW TO BE A GOOD CITI2EM 

0 Develop an utMtetslanaing o1 tt« oWIgationS awJ fcsoonsttilllUcs 
ot citizenship 


LEARN HOW TO RESPECT AND GET ALONG WITH PEOPLE WHO 
THtNW.ORESS AND ACT DIFFERENTLY 

A Develop an appreciation lor and an ondersuntlmg ol other people ana 

B Develop an urtdefslandinp ol poliHcal, economic, and social patterns 
oMhe rest ol the woild ^ . 

C Develop awareness ol the intetdependeivce ol races creeds nations 
and cultures , , 

0 Develop an awareness ot the processes ot group relationships 

LEARN ABOUT AND TRY TO UNDERSTAND THE CHANGES 
THAT TAKE PLACE IK THE WORLD 

A Develop aWUty to adjust to the changing demands ot society 
Q Develop an awareness and the ability to adiust to a changing world 
and Its problems 

C Develop understanding ol the past. Identify with the present and the 
ability to meet Uia future 

DEVELOP SKILLS IN READING. WRITING, SPEAKING AND LISTENING 
A Develop ability to communicate ideas and leelings elfeclively 
B Develop sviNs in oral and wiillen English 


UNDERSTAND ANO PRACTICE DEMOCRATIC IDEAS AND IDEALS 
A Develop loyalty to American democrailc ideals 
6 Develop patriotism and loyally lo ideas pi democracy 
C Develop knowledge and appieciation o< the rights and privileges 
In Our democracy 

0 Develop an understanding of our American heritage 

LEARN HOW TO EXAMINE ANO USE INFORMATION 
A Develop ebiHly to examine consirucllvely and creatively 
B Develop ability to use scientific melhuds 
C Develop reasoning abilities 
0 Develop skrtts to think and proceed logically 


UNOERSTANO ANO PRACTICE THE SKILLS OF FAMILY LIVING 
A Develop understanding and appreciation ol the principles of tivina 
in the lamriyg'Oup * 

^ '”*"3 "» eccepience ot fesponsibilmcs as lamtiy 


C 


***'*1' responsibilities and 
achievement ot skills in preparing to accept them 


*'E****CY AND GET ALONG WITH PEOPLE 
WITH ihom we work and live 

c Develop a cooperaiive attitude toward hvmg and working with others 
A^ enter a SPECIFIC FIELD OF WORK 

C Develop an aspreciai.cm ol good workmanship 



LEARN HOW TO BE A GOOD MANAGER OF MONEY, PROPERTY 

AND RESOURCES . 

A Develop an understanding o* economic P^'**!^*?* and 

B Develop ability and understanding In personal buying se g 

C SS'sMIls in n,nnagemen, ol na.nral ana h»nan rasoarcaa and 
man s environment 

DEVELOP A DESIRE FOR LEARNING NOW AND 
A Develop Intellectual curiosity and eagerness lor lifelong learning 
B Develop a positive attitude toward learning . , -aju^ation 

C Develop a positive attitude toward continuing independe 

LEARN HOW TO USE LEISURE TIME 

A Develop ability to use leisure lime pr^uct^eiY , jglsure 

B Develop a poslllveallltube toward paiMlpauanwa. 

time actlvlties-physlcal will lead to wise and 

C Develop appreciation and Interests wnic 
enjoyable use ol leisure time 

PRACTICE AND UNDERSTAND THE SAFETY 

A Establish an effective health ^d well being 

B Develop an understanding of good P^ysicaj ^n^o^ation 

C Establish sound personal health habits ana mioi 
D Develop a concern for public health and safe y 

APPRECIATE CULTURE AND Measwd cultural 

A Develop abilities lor eliective espressioo of 
appreciation (line arts) ya„eus forms 

B Cullivate appreciation for Mauty n va ^^gj, 3 (an mgs.c 

C Develop creative sell expression inro«9 „ „„„uaaas 

writing etc) , mu«,c an literature and foreign languages 

D Develop special talents In mosic an 

GAIN INFORMATION fg^M^richoSIn ''Nation to student s 

* r;.irir 7, „d — 

“ °lSS'’,p'’,'d“«lic.°d»>; » 1“ .poor a p.r..Pdi.r v«a..a" 

C Develop a knowledge of specific in 

ING OF SELF WORTH 

OEVEUerP PRIDE IN “arfd ws achidvePdals aad progress 

g seeo..., aod 

C Develop the student s teei v 
self assurance 

B Develop the student s cap goals and processes of 

and play constructively of values go 

C Develop a moral and emic ideas 

D i7.'’,f.°p'Sd..ds=>pe-SP"»' 

0.,N A CENERAU EPUCAT.ON^ „„„pp,s ...pr., scepces 

? KS’pS'^'”"”” 



Getting the Most from the School Testing Program 


Measuring 

Educational 

Process 


Box 15 3 


Schools are encouraged to measure process m order (1) to deter- 
mine whether intended processes are being carried out, and (2) to 
discOTcr which processes produce the desired outcomes If educa- 
tors can measure the processes of teaching, of classroom and 
on^ °t ‘u “f teacher morale, and so 

cLs^in M could be used to increase the 

gams in student learning or reduce the costs of these gams or both 

behatjTSl^nrst^rtr 


implementing the system testing program 

a dislrlct should'liret' Sr Koeai'/o self-improvement, a 

stiocld accepi iPe principle o' usha Secondly, a district 

well'" -^-'■wTde'::d;:i;"e?:ri,fer.ra°s 

tPe groalesl potenllal (Particularly the published ones) have 

The fourth step should h« 

'°oolT"" '’"t'etmanoe on each goal 

e seals specific lor their grade levef Thl Pedormance 

'"e ““tr/c- 

- evaluate s'p- 1^::™::™ S? ‘ 



Program and System Applications of Test Data 469 


\Vhether the purpose of testing is to monitor and improve individ 
ual, class, or system performance, the pattern of procedures is 
largely the same (and has basically served as the theme of this 
book) In all cases objectives must be specified and tests must be 
constructed or found to measure those objectives — tests that are 
appropriate, valid, reliable, intcrpretable, and usable To deter- 
mine the degree of proficiency that has been attained by t e 
student, class, or system, the test results are compared to t e 
objectives Where proficiency levels are adequate, no a 
evaluation is needed, but where performance is below acceptable 
levels, test results should be analyzed to provide mtensihed or 
alternative programs of instruction to attempt to 
ance to an acceptable level, and further 

sary To monitor, to diagnose, to the 

the achievement of proficiency, to certify t ese , 

functions of tests m the schools, for the students, the teachers, and 


the administrators 



470 Getling the Most from the School T«ttag Program 


Additional Information Sources 

Anderson S B et al Encyclopedia of evaluation San Francisco 
JosseyBass 1974 

Bauemfeind R H Bmtding a school testing program Boston Hough 
ton Mifflin 1963 

Davis F B Educational measurements and their interpretation Bel 
mont Calif Wadsworth Publishing Co 1964 

Findley W G The impact and improvement of school testing pro- 
grams Sixtysecond yearbook of the National Society for the 
Study of Education Part // Chicago University of Chicago Press 
1963 

Glaser R & Nitko A J Measurement in learning and instruction 3n 
R L. Thorndike (Ed) Educational measurement 2nd ed Wash 
D C American Council on Education 1971 Chap 17 

Goldman h Using tests m counseling NY Appleton Century Crofts 
1961 

Strong R How to report pupil progress Chicago Science Research 
Associates 1961 

Tyler R W Educational evaluation New roles new means The 
sixty eighth yearbook of the National Society for the Study of Edit 
canon Part II Chicago The University of Chicago Press 1969 

Womer F B 6i Wahl N K Test use In R. U Ebel (Ed) Encyctch 
pedia of educational research 4th ed N Y Macmillan 1969 pp 



Self test of Proficiency 471 


SelMest of Proficiency 


(1) You are trying to convince your colleagues to collaborate on the con- 
struction of a test item file Cite three reasons you would use to 
convince them that the file Is a good idea 

(2) Describe the steps in establishing a test item file 

(3) Individual test results can be very helpful to the classroom teacher 
Give an example how the teacher can obtain and use test data to 

a monitor and certify student progress 
b diagnose Individual deficiencies 
c decide whether or not to prescribe additional instruction 
d provide feedback to students 

(4) Construct a sample roster for recording students performance by 
objective 

(5) Consider the class summary of achievement test pertormance shown 

a mat'irama oC^at strength^ How did you determine thisT 
\b Wha la the area ol greatest weakness^ How did you determine this’ 
0 Sd on the print-out what changes in instruction would you 
recommend the teacher make’ 

(6) Describe a system for classroom achievement monitoring 

17) a nescrib/. a primary valued measuring educational input 

b Whv V* ^dhcational outputs more appropriate than processes tor 

measuring and evaluating educational elfects*? 
measuri g notwithstanding there are some uses to which 

" J^rmrsurement of educational processes can List two 

♦hrano wavs to determine whether a new instructional pro 
(8) Describe throughout the sixth grade of a school is effective 

cedure being u&cu 



472 Getting the Most from the School Testing Program 

COMBINATION 
CLASS BECORD SHEET 








Appendices 


appendix a 


A GLOSSARY OF MEASUREMENT TERMS* 


Academic Aptitude Tlie combination of native and acquired abilities 
that are needed for school learning likelihood of success m 
mastenng academic work as estimated from measures of the 
necessary abilities (Also called scfiolasitc aptitude) 

Achievement Age The performance level or achievement test score 
expressed in terms of the chronological age group for which a 
particular performance level or achievement test score is average 
Achievement Test A test that measures the extent to which a person 
has achieved something acquired certain information demon 
strated proficiency m certain skills — ^usually as a result of instruc- 
tion 

Acquiescence Response Bias Responding to stimuli other than the 
test Items themselves This usually takes the form of responding 


• Adapted {rotn A Cfossary of Afeosurement Terms (Test Service Notebook 13) 
prepared by Blythe C Milchell distributed toy Harcourt Brace Jovanovich Inc 
and reproduced by permission It Is introduced m part by the following state- 
ment This glossary of terms used m educational and psychological measurement 
IS primarily for persons with hmited training m measurement rather than for 
the specialist The terms defined are the more common or basic ones such as 
occur m test manuals and educational journals In the defimtions certain tech 
meal t es and niceties of usage have been sacrificed for the sake of brevity and 
It IS hoped ctarily Where tha« is not complete uniformity among writers in the 
measurement field with respect to the meaning of a term either these variations 
are noted or the denmtion offered is the one that wntETS judged to represent the 
best usage In addition lo the HBJ glossary edited for use here other measure- 
ment terms related to concepts developed in this book have been included 



m some systematic pattern such as marking IruE (yea saying) or 
false (nay saying) on every item 

Age Norms. Values representing typical, or average, performance 

for persons of vanous age groups 

AUemale-form Reliability. The closeness of correspondence, or cor- 
relation, bemeen results on alternate (. e , equivalent or parallel) 
forms of a test, thus, a measure of the extent «> 
forms are consistent or reliable m measuring whatever they do 

AppmpXene«“on°achie;ement t«ts, an 

Sd by a comparison of test content with instructional objec 
tives (A'so called content 

Aptitude. A “tnbination of ^ibties and^^ ^ 

native or develop proficiency in some particular 

vidual's ability to leamo PP^ general academic ability 

area AP'^ e ability or rntfl/igcncc tests), those of 
(commonly called ittcnrai 'amencal, mechanical, or musical 
special abilities, as , ^ learning and prognostic 

ability, those assessing previous learning and are 

'"‘H’.'tralTfuture p.formance, usuri.y tn a specific field such 
fore°ign divided by their number 

lns‘usu“ny •V/'liVri’heranor measures of central 

Average A widely used averages are the arithmetic 

tendency The thr^ m j„ode When the term "average 

ruTerw"e”gnat,on as^to type, the most likely assumption 

IS that It IS the arithmetic mean sample 

Battery. A group of severa comparable 

population so that rcsu administered 

(Sometimes t. 

together, even thougn „f school achievement. 

The most comtnon separate learning areas 

which include subtest instrument used to record judgments 
Behavior Rating Seal behavior of a given individual 

about the incide svill be inlluenced by subjective factors 

or group Such JPJf^r reliability 

and should be te performance measured by a test 

Celling The upper limit P observations, or characteristics, each 

Cheekhst A >«* . ^d ans vered ’yes') by the rater or observer 

of which IS chec judged to have occurred satisfactorily and 

when the d™ sv“ed no") when jndged not to have occurred 
not checked (or answ 

satisfactorily 


475 



476 Appendix A 


Chronologlcai Age A persons age usuaBy expressed m years and 

CoeffideS of Correlation A measure of (he degree “ 

gnmg togetherness heltteen tvto sets of measures ^ ‘he same 
i^Mduals The enrrelatton cneffletent most 
test detelojunent and educational research is that knotvn as the 
Pearson m product moment r Unless nthentise 
correlation usually reteis to this coeScient but rant 
and others are used in special situations Correlation coe _ 
range from 00 denoting a complete absence of relationship to 
4.1 nn and tn -1 no indicating perfect positive or perfect negative 


correspondence respectively (See correfalton) 

Completion Hem A short answer test question calling for the filling 
m of an omitted word or phrase 
Concurrent Vahdltj See m?idir> 

Construct Validity Seevofidiiy 

Content OutUne A specification of the content covered in a segment 
of instruction and of the importance of each piece of content This 
outline sen es as a basis for test item construction 
Content \ ahdity See appropriateness 

Correction for Guessing (correction for chance) A reduction tn score 
for WTong answers sometimes applied in scoring true-false or 
multiple choice questions Such scoring formulas (R-W for tests 
with 2-option response R— ViW for S^iption response R—hW for 
4 etc) are intended to discourage guessmg and to yield more 
accurate rankings of lest takers in terms of their true knowledge 
They are used much less today than ui the early dajs of testing. 

Correlation Relationship or going togetherness between two sets 
of scores or measures tendency of one score to vary concomitantly 
with the other as the tendency of students of high IQ to be above 
averace in reading ability The existence of a strong relationship — 
« e a high correlation — between two variables does not necessarily 
indicate that one has any causal mJiuence on the other (See coeffi 
cient of correlalton) 


Criterion A standard by which a test or test performance may be 
judged or evaluated a set of scores ratmgs etc that a test is 
designed to measure predict or correlate with (See validity) 
Criterion referenced Test Term used to describe tests designed to 
provide information on the specific knowledge or skills possessed 
by a student Such tests are designed to measure the objectives of 
mstruclion Their scores have meaning m terms of what the stu 
bent knows or can do rather than m their relation to the scores 
maoehj some comparison or norm group 
Criterion Validity See validity 

'“'5 a'tmpt to prottde an equal op- 
-S, rallures and Me expert 

cnees Their content must therefore be Irnnted to that tvh.ch ts 



Appendix A 477 


assumed to be equallj common to all cultures or to material that 
IS entirely unfamiliar and novel for all persons whatever their 
cultural background 

Dccllc Any one of the nine percentile points (scores) that divide a 
distribution into ten parts each containing one tenth of all the 
scores or cases cterj tenth percentile The first decile is the 10th 
percentile the second the 20th the eighth the 80th percentile etc 
Deviation The amount by uhich a score differs from a designated 
reference value such as the mean the norm oi the score on some 

DevMon Iq'(DIQ) An age based index of general mental ability It 
IS based upon the difference or deviation between 1“ PO^ons 
score and the typical or average score for persons of the same 
Sronolomcal agV Deviation lOs from most current mental ability 
cnronologicai “h , 1 ^ ^^^n of 100 and a standard 

TeTaboroTtriorlarSfined age group (See mfefhgence 
quouent) ^veakness or 

Diagnostic Test * „t weaknesses or deficiencies 

strength to components or subparts of a larger 

body o/lnformation or skill Diagnostic tests are most commonly 

lviir‘”'f‘’%"‘H The'pfrcenmi'of a spKified group such as students 

Difficulty Index The p answer a lest item correctly 

of a given age or grade ho^^^^ ^ differentiate be 

DIscrImlnabllllyIndex 

tween persons ® j^n^ed from the number passing the 

uem'iTthe highest third of the group (on total score) and the 

number passmg in the lovvesUh^rd^^^^ or options in a multiple 

Dtstraclor short answer test item 

choice or matching j^m,on) A tabulation of the scores of a 

Distribution (frequency 11 ^^. n„,„ber of individuals obtaining 

group of “‘•'"‘f "ber of individuals within the range of each 
each score or tne nu 

interval Svstematically constructing items to be repre- 

Domain referencing y possible conditions set forth m the 

sentative of the lu intended to be a measure 

objective for which tji ^ ly 

Equivalent Form J ^ nature of the content and the number 

parallel with «^„oms included and that will yield very similar 
and difficulty ot „t vanabihty tor a given group (Also 

average scores a ^ pora/let farm) 

called alternate standard error of measurement 

Error of Measurement ro„des test takers with the oppor 

Essay type I‘“” and compose their own responses within rela 

tunity to s'™ , Sconng of an essay type item often involves 

lively broad 11 mm. 



478 Appendix A 


the subjective judgment ot the scorer m contras, to a 
ansuer item which structures or even includes the correct 

sponse making scoring quite objective 

ExtmWion '» ““y estimating ™ ^ 

aWe beyond the range of available data As applied to <«' 
the process of extending a norm line into grade or age levels not 
tested in the standardization program in order to permit »nte^re 
tation of extreme scores Since this extension is usually done 
graphically rather than empincally ot according to a given tn^ithe 
matical {unction considerable judgment is involved Extrapolat^ed 
values are thus to some extent arbitrary for this and other 
reasons they have limited meaning 

Face Validity Refers to the acceptability of the test and test situa 
tion by the examinee or user m terms of apparent uses to which 
the test IS to be put A test has face validity when it appears to 
measure the variable or objectives to be tested 
Factor In mental measurement a hypothetical trait ability or com 
ponent of ability that underlies and influences performance on 
two or more tests and hence causes scores on the tests to be cor 
related The term factor strictly refers to a theoretical variable 
derived by a process of factor analysis from a table of intercorre 
lations among tests However it is also used to denote the psycho- 
logical interpretation given to the variable— le the mental trait 
assumed to be represented by the variable as verbal ability nu 
tnertcal ability etc 

Factor Analysis Any of several methods of analyzing the iniercorrela 
tions among a set of vanables such as test scores Factor analysis 
attempts to account for the interrelationships in terms of some 
underlying factors preferably fewer in number than the original 
vanables and it reveals how much of the variation m each of the 
original measures anses from or is associated with each of the 
hypothetical factors Factor analysis has contributed to an 
understanding of the organization or components of intelligence 
aptitudes and personality and it has pointed the way to the 
development of purer tests ot the several components 
Faking Giving a test response on an affective test that is an intended 
distortion in an attempt to create an impression It is more likely 
to occur when the test taker feels pressure to respond m a par 
licular way such as when taking an employment test 
Forcedvholce Hem Broadly any muluplv choree Hem m which the 
T?"’.”'™ *“ “'“t or more of the given choices 

choice Kcm used m jrersonahty tests m which the options are (1) 
of equal preference value re chosen equally often by a typik 
group and arc (2) such that one of the optioi discnminatS be 



Appendix A 479 


tween persons high and low on the factor that this option meas 
ures, while the other option measures another factor 
Frequency Distribution See distribution 

Grade Equivalent (GE) The grade level tor which a given score is 
the real or estimated average Grade equivalent interpretation, 
most appropriate for elementary level achievement tests, expresses 
obtained scores m terms of grade and month of grade, assuming a 
10 month school year (eg, 5 7) Smce such tests t>re usually 
standardized at only one (or two) point(s) within 
equivalents between points for which there are data based scores 
must be "estimated" by interpolation (See extrapolation, inter 

The average test score obtained by typical pupils classi 
fiera™a P'acement (See grade eqniva, cut, norms. 

be administered to a number of individ 
IndlrftaVS^rTtertta^ran"^ <0 only one person 

.ntel^UrOuotlent aO, 

age to his or her ^ g f 5 giy_and particularly for adult 

eliminate “ th k assumed to have ceased-the ratio 

ages, at wb'eh m=n“ « , f^r chronological age This 

q"lQ has been gradually replaced by the deviation 10 con 

cept (SeedevwIwn fO) ^^ess of estimating intermediate 

Interpolation In gen ,^ 5 , „orms, it r^ 

values between tw assigning interpretive values (such 

fees to the procedure between the successive average 

as grade t^SUtvalen 1 ^ standardization process Also, in 

scores actually necessary at times to interpolate to obtain 

reading norm tab es between two scores given in the table, 

a norm ^„„rt here a percentile rank of 83 (from 81 + VS 

e g , m the table s interpolation, to a score of 46 a score 

of 6) to^a percentile rank of 94 (obtained as 87 

of 10) 





480 Appendix A 


,„..r,teubUUy An md.cat.on of what the scores on “ ^ 

how their meaning is denveA or Test rwtecpcetation 

tilhtrcriunon referenced OT norm referenced „f7ipr 

Jmentoiy Test An achievement test that attempts to cover rather 
thoroughly some relatively small tout of specific mstroction 
training An inventory test as the name suggests ts m the nature 
of a stock taking of an individual s knowledge or skill and i 
often administered pnor to mstruction Inventones are also used 
to measure personality traits interests attitudes problems mo i 
vaiion etc (See personahty rest) 

Item A single question or exercise m a test 

Item Analjsls The process of evaluating single test items by deter 
mining the diQicuhy value and the discriminating power of the 
Item and often its correlation with some criterion 
Kuder Richardson Formulafs) Formulas for estimating the reliabil 
ilyof a test that are based on inter item consistency and that require 
only a single administration of the te^t The one most used for 
mula 2i requires information based on the number ol items in the 
test the variance and the lest mean Kuder Richardson formulas 
arc not appropriate for use with speed tests that is tests which 
measure rate of performance nor with tests whose items cannot 
be scored as either right or wrong 

Ukert Scale An attitude scale in which the test taker is given a series 
of attitude statements and responds by choosing one of given 
choices strongly agree agree undecided (this choice is not always 
used) disagree ot strongly disagree 

Matching Item A short answer test item in which the student must 
associate an entry m one list with one m another 
Mean See crtt/imetic mean 

Median The middle score in a distribution or set of ranked scores 
the point (score) that divides the group into two equal parts the 
SOih percentile Half of the scores are below the median and half 
above it 

Mental Ability Skills mcludmg Teasofimg verbal comprehension and 
fluency numerical or quantitative ability and figural comprehen 
Sion that are influenced by or relate to a learning environment 
which reinforces or requires them More traditionally this has 
been called tnielUgence 

IMcmal Age (MAI Thv ag« tor which a given score on an intelligence 
or menial ability test is alerage or normal If the average score 
Ter ,v « T «' ehiMren 6 years 10 months of 

° *° ’’me => "'snhtl 

agcoI6-l0 (Seeachieienienf oge c/ironofogicaf age) 

t'on ''"loently a d.stnbu 

Multiple Choice Item A test .tern tn which the test taker s task ,s to 



Appendix A 481 


choose the correct or best answer from se\eral guen answers or 
options ^ , 

N oTfi The sjmbo! commonl> used lo represent the number of cases 

or obscnations in a distribution 

Nominations The procedure of naming choices or preferences It is 
often used to identifj fnendship choices in which case its results 
are diagrammed as a sociogram 

Nomerbal Test A test that does not require the use of words in the 
Item or in the response to it (However oral directions be 

^orr^^rLTdrsm,^^^ ~ - s^m'rsi^n™; 

distribution scores or ^a^ous distances above the 

about the below it Cases are concentrated near 

rLrinrir « 

SLi;:rar?5.S^e‘rnofm"lir- been'’urefu. in much test 

Nor^!nrcCr's™P;j/-rone;^^ 

a norm “““"^“e'drawTm (D Ihe P'otled meat 

Norm line A age or grade groups or (2) the succes 

median scores of group 

sue percentile points describe tests designed to pro- 

Norm referenced Test -erformance of test takers relative to one 

vide information on interpretation are obtained from a 

another Usually norm ^ contrasted to a criterion 

iiornuiig group J interpreted on an absolute rather than a 

relative basis ,,,nniv a frame of reference by which meaning 

Norms Statistics that s pur upon the 

may be given to pupils of various grades or ages in the 

actual perf”™'^" for the test Since norms represent average 
standardisation F»npr pp, regarded as standards 

or typ.cnl °™““ble levels of attainment The most common 

or as universally des ^^^^^^^ parcentde ran!: grade eqmvufenl 

types nf "““^pference groups are usually those of specified age 
and statitTte 

or grade , nutcome of instniction stated in such a wa> 

Objective An ^ can be obser\ed and measured Objectives 

that Its constructing test items (Also called behav- 

cfwrve as the 



482 Appendix A 


mol instruct, one! mmstmblc latmcr. or periormance objec 

O T;Tu)'r“i.ch ..oms measurrng a vanety of men 

^al operations are all combmed mto a smgle sequence 
being grouped together by type of operation and (2) 
onl> a single score is denied rather than separate scores for 
operation or function Omnibus tesu make for simplicity ot 
admimstration since one set of directions and one overall tune 
limit usually suffice 

Other Two<hQlcc Item A short answer item other than tnie-faise 
that requires the test taker to classify or categorize objects into 
one of two categories (such as a capitol city, not a capitol city; 
Parallel Item Agreement A rcliabihiy procedure for use with entenon 
referenced tests in which performances on the items that were 
i^ntten to measure ihe same objecliie are compared and an mdica 
tion of similarity or dissimilanty in these performances noted 
Lack of agreement can then become the basis for item revision 
Percentile The percent of cases falling at or below a given or indi 
caied point in the distnbulion Thus a score coinciding with 
the 3Sth percentile (P, ) is regarded as equaling or surpassing 
that of 35 percent of the persons m the group and such that 65 
percent of the performances exceeds this score 
Percentile Rank (PRl The expression of an obtained test score m 
terms of ns position within a group of 100 scores the percentile 
rank of a score is the percent of scores equal to or lower than the 
given score in its own group or in an external reference (i e norm) 
group 

Performance Test A test usually involving some motor or manual 
response on the lest lakers part generally a manipulation of con 
crctc equipment or materials as contrasted to a paper and pencil 
test The term performance js also used to denote a test that is 
actual!) a work sample in this sense it may include paper and 
pencil tests as for example a test in bookkeeping in shorthand 
or in proofreading where no materials other than paper and pencil 
may be required and vvhere the test response is identical with the 
behavior about which information is desired 
Perwnalli) Test A test intended to measure one or more of the non 
Intcllcciivc aspects of an individuals mental or psychological 
makeup an msiromcnt designed to obtain information on the 
airectivc characienswcs of an indnidual—cmoiional motivational, 
auitudinal etc— as distinguished from his or her abilities Per 
’iSi'l'iU'r” =«»1 adjustment imentones 

o ’■>' sclMescripjive re 

S b> oncITlt ""‘"S tltut cal] for 

mung t) ones self or anolher the extent to which a subiect 
po %rsvt-v ccriam traits and <3) opinion or altilude inventones 



Appendix A 483 


More spcciricillj category (1) may be called tests of personality 
orientation category (2) self-concept and category (3) attitude 

Pon^Tcst A test intended to measure 

fected by speed of response hence one m which there is either no 
time limit L a teiy generous one Items are usually arranged m 

PraerE°.VcrXTn"‘’;fprnvionscxpcrien« 

later administration “f f = " “„rtad‘of‘ ouesnoL etc 
PmeneVXcrrgr^ates. when the 

fnrwrr a re.at.v?ly novel 

experience for the subjects 

Predlcthc Validity SeevoMity ^ ( 5 ^^ 

Product moment Coefficient (r) Also Known 

coeffleieiil “/ ed to determine whether a pupil has 
Proriclcncy Test A test dc ^ instruction or a single 

acquired proficiency “ ® „sul,s g,„e information on what 
fp7pKar%on — 

i^re'rdmt^-ten^nrefe^ „„ 

Profile A graphic up when the results have been ex 

either an individual or g ^ g^able terms (standard scores 
pressed in some etc) The profile method of 

percentile ranks g . -{(Pcation of areas of strength or weak 

presentation permits lu 

ness „ j A test used to predict future success in 

Prognosis (prognostic) l 

a specific subject or ^ personality 

Projective Technique (p J ^ responds as he or she chooses to a 
study in which the pictures unfinished 

series of ambiguous * „„der this free response condi 

sentences etc It niamfestations of personality charac 

tion the subject__proje^ suitable methods be scored 


Murray Thematic P 

methods ^ ^ r,nmts that divide the cases in a distribution 

Quartile One uf * -roups The lower quartile (<3 ) or 25th P=f“b*' ' 
into four equal gr P j ,1^ group the middle quartile (Q,) is 
Ss off the ■'-''“‘/““reent.te or mrfmn and divides the second 
the same as *he 50th p^^ quartile (Q.) or 75th 



4M Appendix A 


has an equal chance of being included— that is in a way that p 
dudes the operation of bias or selection The purpose in using 
sample free of bias is of course the requirement that the cases 
used be represenlatne of the total population if findings for *e 
sample ate to be generaliied to that population In a stratified 
random sample the drawing of cases is controlled in such a way 
that those chosen are representative also of specified subgroups 
of the total population (See representative sample) 

Range The difference between the highest and the lowest score on a 
test by a group of test taVers 

Raw Score The first quantitative result obtain d in scoring a test 
for example the number of right answers time required for per 
formance number of errors or a similar direct unconverted 


umnterpretcd measure 

Readiness Test A test that measures the extent to which an individ 
ual has achieved a degree of maturity or has acquired certain skills 
or information needed for successfully undertaking a particular 
new learning activity Thus a reading readiness test indicates 
whether a child has reached a developmental stage where he or 
she may profitably begin formal reading instruction 
Recall Item A type of item that requires the test taker to supply the 
correct answer from memory or recollection as contrasted with a 
recognttton item in which he or she need only identify the correct 
answer For example 

Columbus discovered America in the year is a recall 

Item Both contpleiton and unstructured items are recall items 
Recognition Item An item that requires the examinee to recognize 
or select the correct answer from among two or more given 
answers (options) For example 
Columbus discovered America in 
Q \AT5 b c 1520 d 1546 


is a rccogniUon item All short answer item formats other than 
completion or unstructured require recognition. (See recall item) 
Regression Effect The tendency tor students who make extremely 
high or extremely low scores on a test to make Jess extreme scores 
1 c scores closer to the mean on a second administration of the 
same test or on some predicted measure 

fb' IK'™ lo which a Kst IS consistent m mcasunng 
Whatever It doCT measure accuracy dependability stability trust 

abihiv"iVmuMw‘'' mrasurcnient Reli 

I “ reliability cocfflcient or by the 

0,11 lim i ^ ' mcasuremciu demed from it ^ 

lomLu'S'w, ol correlation between two 

Tof i '“•"'mittrations of the same 

or belweei, hahes of a test (corrected by Spearman Brown 



Appendix A 485 


formula) The three measure somewhat different aspects of reli 
abilit), but all arc properl> spoken of as reliability coefficients 
(Sec alternate form reltabthi) split half reliability coefficient 
Spearman BrOKit formula test retest rehabihly coefficient Kiider 
Richardson formulafs) see also paraltd item agreement) 
Reprcscnlathc Sample A sample that corresponds to or matches the 
populatton of «hich it is a sample «ith respect to charactenstics 
important for the purposes under imestieation In an ttchtev^ent 
test norm sample such significant aspects m|ght be the propor 
tion of cases from tanous tjpes of schools different geographical 

Scair^A contmuum marhed off into "T ' ^um fparhcular 
applied to some object or state in order to measure a particular 

properts of it . 

Scholastic Aptitude scale used for measuring 

Semantic piirercntlal A bipolar adj^hve^^^^^ 

s‘iraradjcc“ne and its opposite at each end with seven points 

between as showTi below boring 

mtercstmg— — — — goodness (e%aluation) 

pot 7 n% iemuj le sonJe other more specific property of the 

given sttmulus , 0 ^ which the correct responses or scor 

Short answer Item An tp . s„ that scores are unaffected by 
mg key may be set up ■ ,5 contrasted 

the opinion or judgme different persons may assign 

with an essay type 

different scores ■'“'‘"S" that departs from symmetry or 

Skewed Distribution A oisi normality Scores pile up at 


one end ana irau Giving a presumably unconsciously 




. ,nj,t conn— "i to the test taker s expecta 

^■-"‘;sr;st"no.;iai or socially acceptable response 
tion of what is the m ^ formula giving the relationship between 
Spearman Brown Fu™"'" ^ ,ts length The formula permits estima 

the reliability p a es ^ lengthened or shortened by any 
non of the ‘^„n reliability of a test of specified length 

amount from th , „non is the estimation of reliability of an 
Its most correlation between its two halves (See split 

Unreliability A coefficient of reliability obtained 

Split half on one half of a test with scores on the other 

by correlating sc Soeamian Brown formula to adjust for the 

half and apP'p"S “ ,est Generally but not necessarily the 
doubled length ot tne nnmbered and the even numbered 

two halves consist 01 

Items , _ c D 1 A measure of the variability or disper 

Standard Deviation (V s SD) 



<86 Appendix A 


s.on of a set of scores based on the square of the deviation of each 
Kore from the mean The more the scores duster around the 
mean the smaller the standard deviation For a normal distribu 

ftfmean T^f »-"= ^ D above 

Iano“e ‘called the 

'“"oremT"' (S E M ) Estimate of the magnitude 

score or(2UEmJp"m“sm!ras“‘ “ “t»"dual 

It ts the standarTdemton of Te 'rlT “ 

<!, a ^‘'"‘‘“'^'‘Eondingtnie scores “•’tamed 

“tmnsfored ' ^^^I’n '.e™ 

pressed for reasons of rnn.s.« scores may be ex 

rotation etc Tbe^^pTesrt^^'f inter 

2 score expression of the^ymionn?, ^ 

score of the group m relation to the sta 

scores of the group Thus standard deviation of the 

standard score (j) « score (X) ~ mean fTi 

Adjustments may be 

ard scores haung any dSired 1° ® of stand 

ta tat up The use of surlnrrd";? -"ay 

|?j“"'a standing of the inditlduals m a'^'Ct the 

E°“' ‘•’a original distribution 7- sior« t,^°'''’ '’’^"8= "ta 

n e,n “ "tott often IS Imni, r “ “aan of 

n expressing the raw scores of Standard scores are useful 
■arms in ms.ances nh„rm“L'"Lv ™if “ompaS 

are no, identical m d,rac„S”n“'so s,? *°'™ '’’a' the two forms 

acrJS.r"' '“™ “ '“"tmuous ^ 

Standardized Ten'T'’”"^" '"“'“"S 

d"«.ts s'’c;':e“d"~""'~e'’rr tample of 

preted In referpn ’^'’"fonnance with defimr ^ P*'®scribed 

‘her rcMnci mter 

foruhichS of ‘he term We can fur 

c%aIuaiion and^fo* chosen on 

'iET-“;tS?r£f“:“r sT.i 

Of a number of shon . 

‘^hort answer .mnis where the 



Appendix A 487 


correct onset cr .s proeidcd and the test taker must recognize it 
(Sec rccogmlwu ucm. malchmg .tern, mult.p’e choice item, other 

tuo^cfioice Item, and triie-falsettem) 

Sureev Test A test that measures general achievement m a gi 
area u suanv n th the connotation that the test is intended to 
asSs group status, rather than to yield precise measures of ind. 

r.scora“A'’:"‘score scale using a mean of SO and a standard 

Ta,o“Tn™bodimcnt^.he^^^^^^^^^^^^^^^^ 

;:^sTelca"io™''"onres are ^hose of the cognitive and 

alfectne domains j reliability coefficient ob 

Test-retest Reliability I * ^ a second time after a short 

tamed bj administenng 'f ^ores 

mtervalandcorrelatingtheu-^^^^^^^^ , 

True-False Iteni A test q .rue or false 

statement and asked w e hence, a hypothetical value 

True Score. A score entir y jesting because testing always in 
that can never be 'a t?ue score may be thought of as 

\ol\es some .nfinue number of measurements from 

the average ....alent tests assuming no practice effect 

the same or exactly q ihe testings The standard devia 

or change m ^ number of ‘ samplings is known as the 

wer question that can be answered by 
Unstructured Item A short a ^ completion item 

a word, ph^ase. or contained within the item For example 

'ilhout the bla „ completion item and recall 


a word, . ^,nk contained witnm me uei.. — r-- 

except without the (See completion item and recall 

Who is the author or 


Who is tne dui.— ... 

Item) f ,he suitability or practicality of a test as it 

Usability. An md.catKm For teachers, this would be classroom 

relates to its inten 

use , „ ,vh.ch a test does the job for vvh.ch it is used 

Validity. The extent to 

that IS, measures what it different tests of the same 

Concurrent Validity Th validity might be evidenced by 

property are in ab.hty and of achievement 

wncurrent tccascrcs generally accepted as or 

by the relation of a n eorrelation between scores on a 

Sown to be val® ^Ss that are valid but are less objectjve 
test and criteria maa^ obtain than a test score would be 

and more time aousu "S ^ tests or measures of 

Construct Validity ^ y , properties agree Tests of 

‘'“" diftrent but mShanical apUtude critical thinking 

personality ver 



483 Appendix A 


etc , are talidated m terms of their construct and the relation of 

their scores to pertinent external data c 

Criterion Validity The extent to which a group already proficient 
or expenenced m the quality measured by a test scores higher 
on that test than before they acquired proficiency or higher 
than a nonproficicnt or inexperienced group For the validation 
of performance tests, trained groups are compared to untramed 
groups or groups are compared before and after training 
Predlcihe Validity The accuracy with which a test (such as an apti 
tude prognostic, or readiness test) indicates future outcomes 
(for example learning success) m a particular area, as evidenced 
by correlations between scores on the test and future criterion 
measures (e g the relation of score on an academic aptitude test 
administered in high school to grade point average over four 
jears of college) 

Variance A measure of dispersion of a set of scores from the mean 
the square of the standard deviation 

C score A standard score expressed in standard deviation units having 
a mean of 0 and a standard deviation of 1 


AFPCVOIX n PREPARING TEST ITEM SPECIFICATIONS 

After jou have written your objectives but before you write the 
lest Items u is advisable to prepare lest item specifications A simple 
ivaj to carry out this intermediate step i$ to use index cards either 
S'xS" or 4''x6" At the top of the card write the objective and beloxv 
theobjectnemalceaiableconsisiingof three parts the conditions the 
action and the criteria 

Following are some examples Note that the specifications provide 
information about the range and kind of conditions that are suitable 
for mcasunng the objective the specific action or performance 
required of the student and the entena for evaluation of the per 
tormancp ^ 


Example A 

h<il7n ihe nalural and 

human made boundaries the capital and major cities 


CONOmONS 

An outline map of 
Jersey with 
water and human 
made boundaries 


ACTION 

a Label each body 
of water and human- 
made boundary that 
separates New Jer- 


CRITERIA 


Correct labeling of 
Delaware River. 
Delaware Bay Ar- 
thur Kill, Upper New 



Appendix B 489 


marked, and three 
dots, one In posi- 
tion of Trenton, one 
Newark, and one At- 
lantic City. 

sey from Its neigh- 
bors 

b Each of these 
dots represents a 
city Label each by 
name and indicate 
which IS the capital 

York Bay, Hudson 
River, and human- 
made boundary with 
New York State 
Cities labeled cor- 
rectly and Trenton 
marked as capital 

Example B 

Obiectlve Given a pair 
factors and choose the gr 

of numbers, the studen 
eatest common factor 

will list their common 

CONDITIONS 

ACTION 

CRITERIA 

Two numbers be- 
tween 2 and 100 that 
have no fewer than 
two and no more 
than four common 
factors other than 

1, e g , 24 and 8 

a State all common 
factors 

b State the great- 
est common factor 

E g , states all com- 
mon factors 8, 4, 
and 2. designates 
largest, 8 


Example C omitted the 

Ob:ectivs Given a copy ol a P „, ji, necessary commas 

student will demonstrate proper P 


Personal letter of 3 
or 4 paragraphs, m 
eluding a heading 
salutation and clos- 
ing The body of the 
letter will contain 
compound sen 
tences Introductory 
adverbial clauses, 
words in senes, a 
at least one apposi- 
tive 


Write in all neces- 
sary commas and 
no unnecessary 
ones 


Places commas m 
heading, after salu- 
tation and closing 
Uses a comma to 
separate the two 
parts of a compound 
sentence, after the 
adverbial clause, 
before the word 
connecting the last 
two elements in a 
series of three or 
more, and to set 
off an appositive 




490 Appendix C 


Example D 

Given the equipment and directions, the student vnU set up the 
Zp"rr,. tnrcLu?l S, eteclrccal expenment ,n accordance w,th Iha 
instiucttons 


CONDITIONS 

ACTION 

CRITERIA 1 

All equipment and 
directions to do an 
unfamiliar experi- 
ment in electricity. 1 
eg, wiring a circuit 
to light a butb 
(Note, however that 
the student should 

1 be familiar with the 

1 equipment and its 

Carry oirt the exper- 
iment in accordance 

I with the directions 

1 to produce the re- 
sults called for 

Alt equipment is 
used properly (safe- 
ly and as called for 
in the duections) 

All procedures that 
are called for are 
followed Experi- 
ment produces de- 
sired result e g , 
the bulb lights 


Writing specifications is an important step because it insures that 
the Item vsill measure the objectue and that the range of conditions 
suitable for the objecuxe v.tU not be exceeded In preparing the speciR 
cations do not repeat the wording of the objective, the point is to 
increase the le\el ot detail preparatory to writing the items At the 
same lime specifications should be broad enough to cover two items 
unless (as m Example A) one item would exhaust all possibilities The 
final step is of course WTitmg the items themselves and entering them 
on ihc back of the index card 


APrCVDlX C AN EXAMPU: OF STANDARDIZING SCORES 


<''**nbuiion of 30 test scores (taw scores) shown 


99 

97 

95 

92 

90 


75 

75 

74 

72 

70 


.perc n N„,e ,hM .he p,„, Ot 30 ,oorcs approximate a porma! 




Appendix C 491 


distribution (that is it is highest m about the middle) For illustrative 
purposes each of the above 30 scores have been converted into a 
(1) z score (2) T score (3) percentile score and (4) stanine score 
These scores are shown in Figure C2 (The mean of the distribution 
of the 30 scores is 74 0 the standard deviation is 14 6) The standard 
de\iation was computed using the formula 

- (sy)» 

N{N - 

We can now convert the 30 raw scores into both z scores and T 
scores using the following standard score lormula 

( X — 
sd“j 


where M = predesignated standard score mean 

S « predesignated standard score standard deviation 


X 

X 

sd 


raw score 

mean of raw scores 

standard deviation of raw scores 


/n u th- ? < 5 core has a predesignated mean of 0 and stand 

(Remember that j score has a predesignated mean of 50 

ard f 10 ) Note that if the z scores are algebraically 

“m^:S‘‘SeSM"ip-nia.e.yren, actually - 0 . in this case 
because of rounding errors 


Frequsocy Distr,bul,cn cl 30 Test Scares r,gure C-1 



> 


4K Appendix C 


Percentiles ^\ere calculated based on the percentile formula guen 
belou 


lOOfno of scores exceeded) 
PERCENTILE = Total number of scores” 


Note the recurring pallern 03 T — except in the cases of ties where 
adjustments were made ui the denominator Other than for ties any 
set of 30 scores once rank-ordered would recene the same percentiles 
regardless of the indnidual raw scores because the percentile is based 
only on relatne po«/ion or standing and not at all on the absolute 
magnitude of the raw score 

The slamtie scores represent the top and bottom 4%, the next top 
and bottom 7®<j the next top and bottom \1%, the next top and bottom 
IT'b, and middle 20®o of the scores Note that the stanme score con 
\ersion results in as many as su raw scores having the same stanme 
score — a decided disadvantage of the stanme score This is somewhat 
compensated for by not having large jumps in rank scores for small 
jumps in raw score m the middle of the distnbution Moreover, for 
ease of interpretation there is an obvious advantage to having a pos 
sibilii) of only nine scores as opposed to a number equal to the num 
bcr of students taking the lest (m this case 30) 


Figure C-2 The Convers/on of 30 Raw Scores on a Tasf fo t-Scores, T*Scores, Stanlne 
Scores and Fercenf/fes 


RAW 

SCOflS 

83 

97 

95 

82 

90 

es 

85 

83 

82 

80 


79 

76 

76 

75 

75 

75 

74 


STANINE 

2 SCORE r SCORE SCORE PERCENTILE 


+ 1 7 67 

+ 16 66 

+ 14 64 

+ 12 62 

+ 11 61 

+ 10 60 

+ 08 56 

+06 56 

+0 5 55 

+ 04 54 

+04 54 

+03 53 

+01 51 

+01 51 

+ 01 51 

+ 01 St 

+ 01 51 

0 50 


9 97 

8 93 

6 90 

7 87 

7 83 

7 BO 

7 77 

6 73 

6 70 

6 65 

6 65 

6 60 

S 55 

S 55 

5 46 

S 46 

5 46 

5 40 



A* 


72 

70 

70 

68 

65 

63 

60 

57 

54 

50 

47 

43 


-01 

49 

4 

27 

-03 

47 

4 

21 

-03 

47 

4 

21 

-04 

46 

4 

77 

-06 

44 

4 

^ r 

-08 

42 

3 

y 

-10 

40 

3 

17 

-12 

38 

3 


-14 

36 

3 

A 

-16 

34 

2 

J 

-18 

32 

2 


-21 

29 

1 



appendix D I)I«|/; 

Vmcncan Guidance ^ 55014 
Publishcn'BIdp C.rcIcP.n« Mmn 

Hie Bobbs Ind 46206 

1300\V62dSl Ind.anipo"s in 

Bureau of Education^ CoUcec'^Emporn Kami, u> 

Kansas Stale Teachers uoi 

-^1 Rcstarch and Sen lee 
Burttu of Educalional K ^2240 

Umsersmotloua louuW 

„ -,,/McCra"' HiU Book Ci> i, 

California Test Bun Monterej Cilif 93940 

Del Monte Research part- 

•e Press Inc 

Consulting f Calif 94306 

577 College Ase Palo Alt 

Counselor R<t'^‘’''J',"f,“n‘'Nashville Tenn 37212 

Bos6164Ael<lenStatio^ 

Educational and Indusln 

PO BO, 7234 S="’’-!'^,„,,„dtheER,ca. 
Educational ^^^^’Pc^aluation) 

Measurement and t 

Princeton NJ 

HarcourtBraceJovanf^^,k NY 10017 
757 Third A\enue 


Third 


> I'l ! I ft Ilf 111. 



494 Appendix D 


Hoaghton Mifflin Company 
2 P^rk St , Boston, Mass 02107 

Institute for Personality and Ability Testing 
1602 Coronado Dr, Champaign, 111 61820 

Instructional Objectives Exchange 
Box 24095, Los Angeles, Cahf 90024 

Personnel Press 

20 Nassau St , Pnncelon, N J 08540 

The Psychological Corporation 
757ThirdA\enue,New York, NY 10017 

Psychological Test Specialists 
Box 1441, Missoula Montana 59801 

Psychometric Affiliates 

1745 Monterey, Chicago III 60643 

Science Research Associates. Inc 
2S9 E Erie St .Chicago III 60611 

Shendan Supply Co 

PO Box 857 Beverly Hills. Cahf 90213 

Stanford University Press 
Stanford, Call! 94505 

Western Psjchological Services 
Box 775 Bcxerly Hills, Cahf 90213 



Answers 


Chapter 1 

(1) b 

(2) False 

(3) To decide what it is you want to measure 

(4) c 

(5) appropriateness 

( 6 ) d 


Chapter 2 . . 

(1) An Intended outcome stated in such a way that its attainment can be 
determined to state desired outcomes, to plan strategies for and 
praduce the desired outcomes to measure to what extent desired 

(2) ?rsrrrford,T?lor?eahng and cootung implements, to prepare 
meats to senJe meals, to clean eating and cooking implements, to 

provide a social setting 

(4) kTheSmgV'odlo knorwlns'Zeotedrf by providing a basis 

' ’ for fuCng I' expectations 

(5) c 

(6) False have been used and all three 

(7) To be correct, an action g „,ap he United Steles, 

parts oMhe “Jl/, by POmf-"» «•<> '-^es, river system, he 

the student shall Missouri fiiver. 

or ^'’^^'’‘‘''^"'Loon^trtems. each mth two 3 digit numbers, the 
'=> ftSjeS^eThfcorrecrans^ 

(9) h concomitant and subsequent objectives 

(10) In proper relation lo p fa,| ,t also fail those objectives 

Determined by seeing ^ objective that is prerequisite to i» 

for which it IS a prereq 

( 11 ) d 

(12) c 


Chapters 

(1) a III, b IV, c I 

(2) c 

(3) c 

(4) a False 
b True 


(5) 

( 6 ) 


application 
affective domain 


49 



496 Answers 


(7) Srafe the names of your state's two senators 

(8) Create an accompaniment for a familiar melody 

Mm c ^ ® removing a lire, putting on a tire, tightening bolts 

110) E g , defining fascism and democracy, presenting examples of fascist 
governments (or descnbing principles of fascism), presenting exam- 
ples of democratic governments {or describing examples of democ- 
racy). discussing advantages and disadvantages of fascism in 
comparison with democracy 

Chapter 4 


(1) a 
b 

(2) A 

(3) b 

(4) a 
b 
c 

(5) . 


state 

identify 

lil.Blv.CIi 

True 
False 
False 


r-aise 

pre s, denis o( the United Stales were 

• '’true® mlse'®® 

oircte tne na.ee ,n„ee pecpte wn;'':ie^^n^fe^d States pres,- 

J Adams 
A. Burr 
P Henry 
T Jefferson 
G Washington 
A Hamilton 

• '--rd 

in the proper order? Presidents of the United States 

'irsl president " "“""PS on the right 

second presiden, 

tPIrd president MePison 
Hamilton 
Washington 

■ IS the resulf or detferson 

•7 + 5- “"“'“'•ding 7 and 37 

•3 + 2-6 


true false 



Answers 497 


• Circle tho correct oddlUons 

2 + 1»3 9 + 6*= 16 

3 + 5»7 8 + 2»11 

4*f*4*^8 S + B^ll 

• Which one of the followhg results when 5 end 6 are added? 
a. 9 

b 10 

c. 11 

d. 12 

• Iho number on the rrgbl which, when added to the number 
on the loft, gives 13 as the result 

1. 4 a e 

2 6 b 6 

3 9 .b.S 

4 5 d 7 

G A 

(7) c 

(8) a 


Chapter S 

(1) a Iv. b V, Cl, d II 

'I ^6'g! on accaslon, 01^ wrtb 

"/ao? d^uTaVorragas have on prroasv Name at /east taro 

possible reasons larccchshcrugjs^^ opossums, beavers, and wood- 

(4) E g , describe I ve ways „„ „,ore 

chucks ere a"”"®/ ° oosef/phon cl each similarity or dillerence 

than one sentence to y hew to use carrots and celery to 

15 ) Eg, In 300 words or ess <, p„„,0,o 

make a device that ‘blls t ^ „,oans ot solving the prob- 

(6) Eg, evaluate their ' , , criteria and apply to the 

terns ol eggresslen SPeony^a^^^^ 

treaty rn an accuracy of solution, completeness 

(8) b,c,e,l,h 


I77g students win emoy coming to schoot, students wi„ tee, that 
school IS students will respect one 

® Loihms privacy ^ ^ 

S:e:antiod:;,er:nt:;, 

(5) a, b, d, g, b 



498 Answers 


» a Because ,h,e ,s a pmaty faCaa, sta.e^n., s.uden.s v,opM have no 

basts for sgreemenl or disagieement teacher would 

t Becaose to.a alalamant contains two ttoughta a taactiot w 
not know which ono tho student was responding , 3 . 

(7) Eg , liking tor the subiect matter, liking for the presentat , 

(8) rg?''r".''treres. value, convenience, expense, firsthand 
learning 

(9) • Field trips add relevat^cy to the learning process 

SA A U 0 SD 

• Field trips are really mtereslmg to take 

SA A U D SO 

• Field trips provide a good basis for first hand learning 

SA A VI O SO 

« Field trips are too expensive and inconvenient 
SA A U D SO 
• Field trips are a waste of lime 
SA A U D SD 

. t would rather stay m school than go on a field trip 
SA A U D SD 
(IQ) • Semant/cD/ff«ren(/a/ 

FteLD TRIPS 

pleasant unpleasant 

irrelevant 




« Adjective Checklist 

Field trips are pleasant 

(check all that apply) irrelevant 

• Nominations 

1 Uame your three lavonte educational activities 

2 Name the two ways you most enjoy learning about different 
Industries and the Jobs they provide 

(It) d 

(12) E g use or adapt existing measures where possible try the measure 
on your colleagues do not require riaroes on papers let students 
decide what information to share do a briefing and a debriefing, be 
sensitive to students' fears be responsive to student feedback 


Chapter 7 

(1) See the list of criteria on pages 179^0 

(2) E g given en ena/yticaf balance and a sample substance, demon- 
strate 8 procedure for deiemSning the weight of the substance Given 
tools materlats and e plan, consfrwcf a bird feeder Reasotr Each 
objective calls lor a real performance or product 

(3) E.g g/ven a piece of construct/on paper, compass, and pencil, con- 
struct B7J exact equHaferal right triangle 

(4) « Uses compass to lay oft equal sides 



Answers 499 


b Uses compass to construct the perpendicular properly — ^ 

0 Resulting triangle has equal sides and contains a right angle 

d Work Is neat and construction lines removed — 

(5) E g , given a ruler, graph paper, and colored pends draw the layout 
el this floor ol the school 

a ®Dravring contains all areas of this llrror of the school 

d All spaces are accurately labeled 

e Drawing is neat 

(7) Student motivation 

a Comes on time 
b Volunleers lor activities 

c Completes assignments 

d Asks questions 
e Shows enthusiasm 

(8) Student OCCA- 

SION FRE- 

NevER rarely ALLY OUENTLY ALWAYS 

Comes to class P 0 F A 

on time 

Volunteers for P q F A 

activities 

Completes m R ^ ^ 

assignments m B ^ a 

Asks questions ^ P q ^ ^ 

Shows enthusiasm preparedness m teaching a lesson 

1 'Hifa’p^C' «ants to do 

4 Is well organized 

I Seransrquestions 

(10) Teacher BEHAVroR BEHAVIOR 


Pr“e"ed“na°smoothlashion 

Is well organized 

r;::iVto\n°w"que=t.ons 


BEHAVIOR 

PRESENT 



500 Answers 


(11) Affective obiectives that deal with such behaviors as cooperativeness 
and involvement should not be neglected, as part of overall student 
evaluation rating of such behaviors v/ould be useful 

(12) As values are clarified and change student behavior should also 
change ^For example students might become more cooperative or 
more involved ) Measurement of student behavior would help to 
determine v/hether the experimental program was stimulating posi- 
tive changes m the behavior of students 

Chapter 8 

(1) c 

(2) Does It fit my objective? Does it reflect the action verb? Does it 
utilize the conditions? Does it employ the criteria’ 

(3) a List the three objectives 

b Determine how many Items you v/ant to write for each based on 
its relative importance 

c Draw the map Indicating the number of items per objective 
d Follow the map to construct items 

(4) 



UNITS OF IMPORTANCE | 


1 

2 

3 

4 

1 


® 



2 


® 

® 

® 

3 

® 

® 


® 

4 

® 

® 




(5) two 

(6) « < b fy c H d V 

Ino objcclrvc) iipms 6 1 ^ '''appropriale 

-»"=>'oa,,aa.p„p,pa.,a,aaZ “ZZXecZ'r" 

19) False 
(10) a d e 

(tt) a // b /y c V 






Ansuers 501 


Chapter 9 


( 1 ) b 

(2) external 

(3) d 

(4) B 

(5) a It, b V, c I, d m 

(6) a criterion,!; construct, c concurrent, d predictive 

(7) a Relate scores on the test to scores on a subsequent lest of 

achievement in electronics or to ratings of successful performance 


in electronics , . 

b Compare the scores of people employed in fhe elecfronics field 
llhaf IS people trained in electronics) with scores of people em 
ployed m anofher field (that is people untrained in electronics) 

(8) a Lk students how much they like school and see if those who say 
‘ ’ fhey like It score higher on the test than those who say they don t 

b see'll there IS a relationship between scores 'hb 

waet.mftri causes of these scores, such as warmth of the teacher, 
'^.4 f T!'nl the teacher toward students, or number of successful 
attitudes of the teacher wwa between liking lor school and its 

ZTs":"ns rbrP-^cts, such as attendance deportment, 
or school performance ratings of their arfistio 

(9) Compare examine and rate represenlative 

^‘’'''‘'[l.^of’Lch students work, then compare those ratings with the 
tesTscores, compare students’ test scores to their own ratings of 

Iheirylistlc abH^y of professional colleagues, see if students do 

(10) Solicit the judgmen P 

better on them after fa g ^ students, see 

see If ,3 themselves higher on knowledge and experience 

ln?Sd dS bSer on them than persons who rate themselves lower 


Chapter 10 


d II, 


(1) c 

(2) True 

(3) d 

(4) False 

(5) a V, b I. 

(6) a split-half 

b parallel-item 
c Kuder Richardson 21 
d test-retest 
e alternate forms 


e III 



502 Answers 


ma lasting and general (^aracteristrcs— reading ability . 

b lasting and specific characteristics— knowledge required by p 
ticular test items 

c temporary and general characteristics— fatigue 
d temporary and specific characteristics— luck (For more exam- 
ples, see Figure 10 3 on page 263 ) 

T, b £, c T, d £, e £ 


(8) a 

(9) a, b, c, e 
(1D) longer 

(11) 8 Separate the tests o1 the high scorers and tow scorers and com- 
pare the performance of each group on each Item Take those 
Items on which the two groups had about the same success rats 
and revise or replace them with ones that correspond more closely 
to the content of the unit 

b Identify those items that the majority of students got wrong and 
find the answer choices that were selected more often than the 
ones you designated as correct, see if these choices are possibly 
correct If they could be considered correct change them to 
improve the Items 

e Identity m advance the ideas that must be presented for each 
Item m order to earn difterent amounts of credit, determine the 
point allocation per essay response for the different features (e g . 
organization creativity), do not look at student names, score one* 
fifth of the responses twice 

(12) This test was designed to measure six objectives with two Items per 
objective students perlormance on each pair of items per objective 
can be examined and those pairs where half or more of the students 
got one member of the pair right and one wrong identified, a close 
examination of the items in these nonparallel item pairs can form 
the basis tor a revision or replacement of at least one member of 
the pair 


Chapter 11 

(f) scores or performance 

(2) a Information about the performance (that is, test results) of a spe- 

cific group of people on which subsequent test score interpreta- 
tion IS based ^ 

'"'erpretalion 

(3) a percentile rank 

b grade-equivalent score 
c standard score 
d sianine score 

111 1 ^ ® average 

15) b, d, e 

Adv=„,a,es enables ,nd,.,dua, scces ,a be tepre.ed by oompa,, 



Answers 503 


( 10 ) 


son to group data, makes conclusions based on test scores sotne- 

of performances ,o obscure the relation between 

ttrontenro':T^"srtelt’d 'any bearing on tbe past or future ,i e , 
real capabilities) of the the test takers 

(7) a skill-testing 
b intrinsic 

(8) a, c, d obiective that each student has 

^ ?p\^s“sed''’ratSe?rhan reporting simply the total number of items 

right on the totaUest ..-ossed’ an objective if he or she 

" ^eTs"b1th'‘cMhelm°s right that were written to measure pro- 

ticiency per objective versus ^ 0o,o„ 50 percent 

r msruc^mrtrrhafrr-st b^e considered less than 

and divide them up in g represents his or her 

each student a number score ti 

d smndard '°"oWre'trthe'cLs as a whole on a pmjec 

WrTnd renVave^hem write their test answers to questions bas 
o°n the one being o that students can understand them 

^ "=Srles., terns per Ob, ectw^ 

o Cut down on "t® ^ ,Po jesi ,n more than one sitting 

, S^doTnairtheproceduresfortestadministration 

0 ,x c vr, d vii, e vru, f P " 

(1) a V, D 

^ot:?jrequ.res Its aPPhcation for success 


( 11 ) 


(12) 



5M Answers 


(7) a mental, b chronological, c rath 

(8) a IQ«=101; b M A =11 years, Smooths 

(9) b 

00) d 

01) b.c 

{12) a II, III, iv, v/, vtl, lx, X 
b I, V, vlli 
C I, VII 
d II 

e lx 
I IV 

03) e 

(15) Supp,, a word relared ,as a p,ed,a„pp „„k) ,p aach o, ,Pree p.ver, 

06) c 

07) b 

08 ) 6 

Chapter 13 
0) d 

oblectrves whoreas slandardTOd achieva,^“''? "i®'' 

a serrerai co„,e„, achievemam lasts are built around 

0) Reliability because fev.^.«« u . 

(5) Fal^e"”"”" ™ allows for the 

^®) 1 0. 2 c 3 a 4 t> 

cacM S” '"'i adminis- 

iPssi^i 

<« l^ake sura all ,t„da„„ '"'1 '«""9 

““"LTplUntmT -I'lslr? 

'or 'n tho duacUonsTHs" '="'0'!/ piva atnl """ol 

’'udanis know how '"""a ekacllv m 1"°"® ‘"‘"o'l 

points tn the room Monitor tes/takf 



Answers 


(10) a I. grade-equivalent score li percentile rank m raw score iv 

standard score v stanine score 

(11) You would know how ®you" 

ha, mam 0 1 ives h smdennro'r L no. prCicien, in 
exactly what math oblectives 

(12) On a . green, of the obiectives or, if the test was 

:r:L7e''ohTeo^bve.cnS5pe;ce„.^^ 

rernsMgm than sl'plrcen, of .he students in the norming group a, 
the same grade level sublesis number of scores types of 

(14) a either the Davis or Nelson Denny 

b Gates-MacGinitie 

c Iowa Silent Reading Test 


Chapter 14 , 

/• nvis. d Strorjg. e CMI 

(1) a SDS, £> -cored by comparing responses to successful 

(2) Scormg The Strong is scored by^^^ „ ber 

people m different fie different interest orientations 

?U'oVIS°I’s°Sred'’oS 24 "eltermined categories that retlec, am 

phLes on f°clmno" unou how much the student s interests 
Interpretation The 9 |g different tields The Kuder 

resemble ‘hose uf success orientations 

tells you about the student related to diflerent occupationa 

that themselves can b 9 occupations to which the student 

fields The ® nis or^her relative emphasis on data people 

,S oriented based on his or 

,3, rschcel sentiment index, b Tennessee, c Self-appraisa 

(4) se/f e.b.d.I.eohocl^c.e^S.I’^ ^ ^ppg ^ CTP 

: 1:Tp e% '1, atudents ,n making career 

7) increasing ,he efiecliveness el classroem activities aimed 

decisions, evaiuam y 

attheatfeotivedomam Sentiment Index before and 

18) As an example Adm = ,be classroom to see if the use of 

after trying out a set of ^ ,g, a„„udes toward school more posi- 

rhese activities makes s 

live Arfministet the Scale of Values to students and 

Another themselves es a way of becoming aware of 

have them score me 
their own values 



SW Ans\sc« 


Chapter 15 

t1) leacher-bmlt tests constitute a mator portion of 

ocoqrani so it .s obviously Important to Improve them, a es s a 
lesDoice title a curriculum and hence should be reuse . 

(,le results in tests that are more valid and reliable 

(2) a Teacher prepares test Items to (It instructional obieclives 

6 Teacher uses Items in classroom testing m ih*. r,vp 

c Teacher uses test results to evaluate items in terms of the live 


criteria 

d Teacher discards poor Items, retains good ones 

(31 a Make sure the lest is appropriate for measuring the objectives 
of Instruction Record the date proficiency on an objective is 
acquired and the degree of proficiency attained 

b Select a test that measures objectives in sequence within a wide 
range, m particular covering objectives that should have been 
mastered prior to instruction Examine the student’s performance 
on this test to identify prior objectives on which the student has 
not acquired proficiency 

c Determine from test results whether there are any objectives that a 
particular student has failed to acquire proficiency on (or any 
students who have failed to acquire proficiency on a particular 
oblective) end provide additional instruction appropriate to that 
objective 

d GWe a pretest or at least precede instruction by presenting the 
objectives, give a poshest and return scored tests with minimal 
delay, clearly Indicate errors and bases for scoring, test in a warm 
and accepting atmosphere 

(4) See Figure 15 2. page 44$ 

(5) Greatest strength spelling (52 percentile) 

Greatest weakness math applications (42 percentile) 

Instructional recommendation spend more time teaching math 
applications and less time teaching spelling 

(6) Divide the test of all ob|eclives to be covered into subtests and the 
class into the same number of subgroups as subtests for testing 
purposes Give each subgroup a dillerent sublesl at each testing time 
80 that alter all the obiectives have been covered each subgroup will 
have taken each suWest Using this procedure, pretest, immediate 
poshest and retention data will be provided for the different sub- 
groups as a basis tor evaluating the results of Instruction 

(7) a Education mpui can serve as a baseline from which to measure 
gains in performance 


b 


c 


VV^e aiQ more adept at measuring how students perform (output) 
aTD being carried oul 

and 10 discover which ptocencs produce Hie desired oulcomes 



Answers 507 


(8) Compare the 6th graders using the new instruction before and after 
to see how much they improve Compare the end results on this 
year’s 6lh graders— who have used the new instruction— with last 
year's (in the same school) who have not Compare the end results 
on this year’s 6th graders — who have used the new instruction — 
with the end results for 6th graders In a comparable school in the 
district not using the new instruction 



References 


Amencan Assoc, at.on for .ho Adsanceraeht of 

model Olid III applicauoti Washington, D C AAAS Commission on 

Science Education. 1965 i. ■ „ ...,1 

Amencan Educational Research Association. American Psychological 
Association. National Council on Measurement in Education 
Standards {or educattortal and psychologtcat tests Washington, 

D C American Psychological Association, 1974 
Amidon, E J and Hough. J B . eds Interaction analysts Theory, re- 
search, and appUcatton Reading. Mass Addison Wesley, 1967 
Ammons, M Objectives and outcomes In R L Ebel, ed Encyclopedia 
of Educational Research. 4ih ed New York Macmillan, 1969, 
908-14 

Anastasi.A Psyeftologtca/ rcstwg. 3rd ed New York Macmillan, 1968 
Anderson, H Acquiescence response bias to difficult achievement type 
true-false tests of male high school students exhibiting rule break* 
mg or rule obeying behavior Unpublished doctoral dissertation, 
Rutgers University, New Brunswick, N J , 1969 
Anderson S B ei al Encyclopedia of evaluation San Francisco 
losscy Bass. 1974 

AnghofT, W H Scales, norms & equivalent scores In R L Thorndike, 
ed Ediicetionfll measuremeni, 2nd ed Washington, D C Amencan 
Council on Education 1971, Chap 15 

Armstrong R 1 et al Development mid evaluaiion of behavioral objec* 
tn es Belmont, Calil Charles A 3ones Publishing Co , 1970 
Asch, S E Effects of group pressure upon the modification and distor- 
tion of judgments In E E Maccoby. T M Newcomb and E L 
Hartley, eds Readings m social psychology, ltd ed New York 
Holt, Rinehart and Winston. 1958, 174-83 
Ashbum R R An experiment in the essay type question Journal of 
Experimeiual Educaiion. 1938, 7, 1-3 

Backman M and Turkman B W A review of the Remote Associates 
Test Journal of EducatiomU Measurement, 1972, 9, 161-162 
^^'^'^1968^ ^ Controiersinl issues m testing Boston Houghton Mifflin, 

Barron F The measurement of creativity In D K Whitla, ed Hand 
book of measurement and assessment m behavioral sciences 
Reading, Mass Addison Wesley, 1968 348-66 

Ciifriire Fair Scales Information Bulle 
T^rn^g ^,573 Institute for Personality and Ability 



Bauernfemd.R H Building a sdmot testing program Boston Hough 
Beatty! WH"ed“np™v, Mg educational assessment mid “ 

Ji measures of affective behavior Washing on D C Associat.on 

B G Differential Aptitude 

M ^ L New Yorh The Psycho.ogtca. 

Berg^TE— n tn socta. scence ^ "^7"“"°" 

^n higher education Boston f ‘e ,909 

enfants Tumman J J Reading tests for the second 

^'^Ty'gradl:”^ -d evaluation Newark, De, Intemat.ona, 

Reading Association 1972 Handbook I Cog 

Bloom, B S 7-“”"“’%“t"‘'CdMcKay.l» 

nitive domain N®"' . uiinian characteristics New York 

StahiUty and change 

Wiley, 1964 , nigdaus, g F Handbook on formative 

Bloom, B S , Hastings ' , ^mUent learning New York McGraw 

and summative evaluation o, 

H'”' U 11 B J and McLemore, S D Sociological measure 

Bonjean, j^Hill.^R and indices San Francisco Chandler 

Publishing Co '967 ,5 ,55, „ New Republic, 1923, 34, 

Boring E G Intelligence as 

history of experimental psychology New York Appleton 
Century Croft^ 1950 ^ observation and recording of 

Boyd, R B end De cg'ucattonal Research, 1966, 36, 529-51 

behavior Tlei’ieieo/ testing of a behavioral reference 

Brown, W The develoP'ne vocational education pilot pr^ 

groups model for ev ^^,^vnt of Education Occupational 

pams New Jersey No 4, 1970 , „ ^ mt 

Research DevelW^^^^^^ a„j reviews Highland Park, NJ 

Gryphon press. and reviews Highland Park, N J Gryphon 

Press, 1970 menial measurements yearbook Highland 

Park.M-J of interests within an occupation o\er 30 

CampbeU fjJoJ^ApphedPsychology.me 50.51-56 (a) 


509 



510 References 


Stability of vocational interests within occupations, over long 

mg Research and Development Center University of Pittsburgh 
Report No BR 5-0253 1966 , . . . 

Cntes J 0 Interests In R L Ebel ed Encyclopedia of educational 
TeseaTch 4th ed New York Macmillan 1969 678-86 

VocatioMflJ psychology New York McGraw HiU 1971 

Cronbach t J Validation of educational measures Proceedings of the 
1969 Invitational Conference on Testing Problems Princeton NJ 
Educational Testing Service 1969 3S-52 

Essentials of psychological testing 3rd ed New York Harper 

& Row 1970 


Test validation In R L Thorndike ed Educational measure 

ment 2nd ed Washington DC American Council on Education 
1971 Chap 14 

Cureion E E Measurement theory In R L Ebel ed Hncvcfopedia of 
educational research 4th ed New York Macmillan 1969 785-804 
time spans Personnel and Guidance Journal 1966 44 1012-19 (b) 

• - Manual for Strong Vocational Interests Blanks for men and 
women rev ed Stanford Calif Stanford University Press 1966 
ic) 

Carter H D The development of vocational attitudes Journal of Con 
suiting Psychology 1940 4 185-91 

Carver R P Two dimensions of tests psychometnc and cdumetric 
American Psychologist 1974 29 512-18 
Caitell J McK Mental tests and measurements Mind 1890 /5 373-81 
Catiell R. B and Warbunon F W Objective personality and inottva 
non tests A theoretical wiroduction and practical compendium 
Urbana 111 University of Illinois Press 1967 
Chauncey H and Dobbin J E resting Its place in education today 
New York Harper & Row 1963 

Churchman C W The systems approach New York Delacorte Press 
1969 


Coffman W E Achievement tests In R L Ebel ed Encyclopedia of 
educaiionat resenreh 4th ed New York Macmillan 1969 7 17 

ewimsKavvoTis knR Thorndike ed Educattonol meas 
urement 2nd ed Washington D C Amencan Council on Educa 
tion 1971 Chap 10 

Coleman W and Cnreion E E Intelligence and achievement The 
mem 195?/?347* sT" Psychological Measure 

m ^ lor memal 

ml im ® Inst. 

corapanson o[ .ten, sdent.on techniques for 
nom, referenced and entenon referenced tests Pittsburgh Learn 



References 511 


Davis, F B Educaltomt measurements and their interpretation Bel 
mont, Calif Wadsworth Publishing Co , 1964 
D Costa. A G and Wmefordner. D W A cubistic model of vocational 
interests Vocational Gntdance Quarterly, 1969, 17, 242-49 
Dewey, J What psychology can do for the teacher [18951 In Archam 
bault.ed / Den ey on education New York Modem Libraiy, 1963 
Dick, W and Hagerty, N Topics in measurement Reliability and 
validity New York McGrawHill, 1971 
Doppelt, J E How accurate is a test score? Test Service Bulletin No 
50 New York The Psychological Corporation. 1956 
Dunn, L M Manual for the Peabody Picture Vocabulary Test Circle 
Pines, Minn American Guidance Service, 1965 
EbeJ, R L Obtaining and reporting evidence on content validity £dii 
cational and Psychological Measurement, 1956, 16, 269-82 

Standardized achievement tests Uses and limitations National 

Elementary School Principal. 1961 40 29-32 
Measurement and the teacher Educational Leadership, 1962 


20, 20-24 

Some limitations of criterion referenced measurement Paper 

presented at the American Educational Research Association, 
Minneapolis, 1970 

Educational Testing Service Making the classroom test A guide for 
teachers Princeton, NJ Educational Testing Service, Evaluation 
& Advisory Service Senes No 4, 1959 

Graduate Record Examination scores for basic reference 

groups Princeton, N S Educational Testing Service, 1960 

- Multiple choice questions A close look Princeton, NJ Educa 

tional Testing Service, 1963 

j^nchor test study Equivalence and norms tables for selected 

reading achievement tests (Grades 4. 5, 6) Washington, DC US 
Government Printing Office (stock no 1780 01312) 1974 

Edwards A L Techniques of attitude scale construction New York 
Appleton Century Crofts, 1957 

2'he measurement of personality traits New York Holt, Rine- 
hart, and Winston, 1970 

Farr R and Anastasiow, N Tests of reading readiness and achieve 
ment A rex lew and evaluation Newark, Del International Read 

mg Association, 1969 „ „ „ 

Farr R Tumman J J , and Blanton B E How to make a pile in per 
Wance contracting Phi Delta Kappan, 1972. 5i. 367-69 

Feuerstem R A procedure to improve the intellective performance of 
young children paper presented at Rutgers University, New Bruns 
wick, N J , 1968 

Findley W G The impact and improvement of school testing pro- 
grams Sixty Second yearbook of the National Society for the Study 



512 References 


o/Edticalm, Partll Chiraeo Unhm.H oI Chic^so Pms 1963 
Flanders N A AiMlJjme clMinom inleroclion Rcadmr Ma 

Free'JfallT 's'lliMo^oni pwcl.ee 0 / pnclioloaicul (dime rc. cd 
NewVork Holl Rmchatt and Winston 1955 

Theory end prcclice cl i>s)Cliohfid! (dims 3rd cd Ncn 3ora 

Holt Rinehart and Winston 1962 so 

Frv E Judcing readability of booVs Tcocficr Edi.cnl.m. 19M 5 3t-39 
Gagnd R M The condtitons ol tearnms Hess Aork Holt Rinehart and 
Winston 1965 . . tm i 

Hie conriilions of ffarnmg 2n<lcd Nevv York Holl Rmcnart 

end Winston 1970 

Cssenitals o{ teaming for instruciton Hinsdale III rnc 

Drydcn Press 1974 

Gallon F /n^mries i«lo /iimian fflciili> and ifs dcvclo/’rficnl London 
Macmillan & Co 1685 

Gardner E F Interprcimg achievement profiles— -uses and warnings 
NCMB mcflinremen/ in education A senes of special reports of 
the National Coimcit on Meastirentenl in Bducotton 1 (2) 1970 
Gerbench J R Spenmeij ob;ectne lesi Jiemj Nc\s York Longmans 
Green & Co 1956 

Oerbench J B Greene H A and Jorgensen A N MeasurenwU ond 
evaluation lit the modem school liev-york David McKaj 1962 
Gerhard M Effective teaching siraiegtes the behaMorot outcomes 
approach N>ack NY Parker Publishing 1971 
Geizels J W and Jackson P \S Cream ii> and infeWigence Cxptara 
tionsw^itUgiftedstHdents Uewyork Wiley 1962 
Glaser R Insirucijonal technology and the measurement of learning 
outcomes American Psvcfiologisi 1965 IS 519-21 
Glaser R and Nuko A J Measurement in learning and Insiruciton In 
R L Thorndike ed Edwafiortaf mea^nremenf 2nd cd Washing 
ton DC American Council on Educalion 1971 Chap 17 
Goldman L l/smg lesfs irj counseling Ncu York Appleion-Ccnturv 
Crofts 1961 

Graham M Modern etementary inaifiemalics New York Harcourt 
Brace Jovanovich 1970 

Granger C H The hierarchy ol objcctises llanard Biijiiiess Rciien 
1964 May-Iune 63-74 

I ^ J“‘'=^"’nadc leas New York Harper «i Ron 1963 
Gudfotd I P The noliirc of hiimoit inlelligence New York McGrass 
nlU 19o7 

Harvey 0 3 Hum D E and Schroder H M Conceptual systems and 
personality otganicalion New Yock Wiley 1961 
™ Educnlionol testing foe Ihe mlltons What tests really 

mean for your child New York McGrawHill 1964 



References 513 


Heron, A The effects of real life motivation on questionnaire response 
Journal of Applied Psychology, 1956, 40, 65-68 

Hi\ely, W et al Domain referenced curriculum evaluation A technical 
handbook and a case study from the Mmnemast Project Los 
Angeles Center for the Study of Evaluation. UCLA Graduate 
School of Education, CSE Monograph Senes m Evaluation No 1. 


1973 

Hoepfner R and Klein, S CSE elementary school test evaluation Los 
Angeles Center for the Study of Evaluation, UCLA Graduate 
School of Education 1970 , , , 

Hoenfner R et al eds CSE RBS test evaluations Tests of higher 
order’ cognlliie, affective, and interpersonal skills Los Angeles 
Center for the Study of Evaluation, UCLA Graduate School of 

HopkinrK V and ’'FmarRip^"1-rojecf No ’oToff 

Washmgton,®Sc 0 ^“/' "" 
Departuien. ^ Heallh, Edu^^^^^^ 

Instmctional J objectives Exchange, 1970 

— 'MafhiZt "cs7-Poh,ect,ves Los Angeles Instructional Objec 

TRfadmf'ts^obiectives Los Angeles Instructional Objectives 

^Z^MMdeZvard school K-n.rev ed Los Angeles Instructional 

Objectives EitcE^n^' Angeles Instruc 

Measures of ® ,972 

tional Objectives Exc referenced tests Princeton N J 

Jackson, R De\eopi ® _ Measurement, & Evaluation, Educa 

ERIC Clearinghouse on ^ ^ 

tional Testing ^ genetics Implications for education 

Jensen, A R Research Journal, i96S 5,1-42 

American Educatto scholastic achievement’ Har 

How much can we 1_I23 

vard £du«fionuI Review, - 

J W Tests and measurements m child 
Johnson,© G San Francisco Jossey Bass, Inc , 1971 

development A hi^aa Rosenzweig. J E The theory and man 
Johnson, R A , KasL F McGraw Hill. 1967 

agement of syste ' ^^etnent test Principles and procedures. 
Katz M Selecting an gducaUonal Testing Service (Evaluation 

2nd ed Series, No 3), 1961 

and Advisory Serv Miles, D Behavioral objectives and in 

Kibler, R J > Barker, » ^ Bacon, 1970 

struction j^bution of developmental psychology to educa 

Kohlberg. L The con 



5U Rclcrences 


tion-exatoples from moral education Educational Psychologist 

KrathrohlVli'' Bloom B S and Masta B B Tasonomy of edum 
Iioaaiobiectives Haudiookll Altecme domain New York David 
McKay 1964 ^ 

Krathvvohl D R and Payne D Defining and assessing educational Ob 
jeclives InR L Thorndike ed Educational measurement Znd ed 
Washington DC Amencaa Coiuial on Bducamn 1971 Chap 2 
Krech D Cnitchfield R S and Ballachey E L Individual m society 


NewYorX McGra\vH>ll 1962 

Knege J W Behavioral objecuves 10 ways to make them count 
GradeTeacher 1971 Sept 138-43 

Lake D G Miles M B and Earle R B Measuring human behavior 
Tools for the assessment of social functioning New York Teach 
ers College Press 1973 

Lennon R T Assumptions underlying the use of content validity Edit 
ca/ional Psychological Measurement 1956 16 294-304 
— — Scores and norms In R E Ebel ed Encyclopedia of educa 
tional research 4th ed New York Macmillan 1969 308 
Levme A S Aptitude versus achievement tests as predictors of 
achievement Edticafionaf and Psychological Measurement 1958 
78 517-25 

Lien A I Afcastirenienj oud cvaJimhon of learning Dubuque Iowa 
\Vm C Brown 1967 Chap 6 

Light R J Issues m the analysis of qualitative data In R M Travers 
ed Second handbook of research on teaching Chicago Rand Me 
Nally 1973 31M1 


Lord F M The relation of the reliability of multiple choice tests to the 
distribution of Item difficulties Psychometnka 1952 17 181-94 
Lyman H B Test scores and whai they mean Englewood ChfTs NJ 
Prentice Hall 1963 

Mager R p Preparing wsiriiciional objectives Palo Alto Calif 
Fearon Publishers 1962 

Measuring instructional intent Palo Alto Calif Fearon Pub 

Jishers 1973 


Malcolm D D Which interest inventory should I use’ Journal of Edu 
calioital Research WaO 44 91-98 

Mallinson G G and Cnimnne W M An mvesligation of the stabdity 
sL'rc'r'Sz « iw'sr*'””' Educational Re 

”'^Row 1^70*’ obiecuves New York Harper & 

rather than for , ntefhgel.ee 

. Pijc/iotogisi 1973 28 1-14 

wf; ”87[^82 ■"■'"'6'"“ Why’ American Psychologist 



References 515 


M.chacl, \V B Prediction In R L Ebel, ed Encyclopedia of educa 
lional research. 4th cd New York Macmillan, 1969, 982-93 
Miller, D C Handbook of research design and social measurement, 2nd 
Cti Nc« York DnidMcKa5.1970 . . 

Millman, J Passing scores and test lengths for domain referenced tests 

RcMCK of educational Rescarch.\m,43,lQl5-\b 

Ian, 1969, 667-77 j T^„„^„baum P H The measurement of 

Osgood, C E Suci O J 5 I957 

Page’,''E’B"Ha«‘«c all faded at performance contracting Pin Delta 

kappan. measurement of learning outcomes 

o' 

P.agm^“''r^'7oeS; o/^W- Mew Vork Harconrt Brace 

_^%he origins of mtelUgenee me, uldren New York W W Nor 

ton, 1952 .^liditv of arguments against behavioral 

Popham, W J ’’''obing meeting of American Educational Re 

goals Pop"' P^=»e"‘ed 7,^8 

search Association t-nicag measure obiectises 

Popham, W J “"“I® „ , prenlice Hall 1973 

Englewood ClilTs MJ , ,mca, attitudes Ann Arbor. Mich 

Robinson, J P ot d .],|.5e3rch Unnersity of Michigan 1968 

Institute for Social aldliides mid occupational charac 

Measures 0 / or™?^^ ° Institute for Social Research Umver 

sity of Michigan ^ J^fea5ltres of social psychological atti 

Robinson J p tmd Shave Research University 

tildes Ann Arbor, Mien 

of Michigan mg bulletin Princeton NJ Educational Test 

m’g^’ervice Pygmalion in the classroom New York 

Winston Stanford Bine, In 

Saltier, J fe sSte'po™ ’’f 

^','' 173-79 objective tesf> Psychological Reports 1958 

Scheier I H 
4 147-57 



516 References 


Schoer L A Test construction A programmed guide Boston Allyn & 

Bacon 1970 * n ir 

Seibel D W Measurement of aptitude and achievement 

\Vhnla ed Handbook o/ measurement and assessment ih benav 
toral sciences Reading Mass Addison Wesley 1968 Chap 8 
Shass M E and Wright J M Scales for the measurement of attitudes 
New York McGrawHill 1967 

Simon A andBojer E G eds Mirrors for behavior An anthology of 
classroom observation instruments Philadelphia Research for 
Better Schools 1967-70 

Solomon R J Improving the essay lest in the social studies In H D 
Berg cd Evaluation m social studies Washington DC National 
Council for Social Studies 1965 137 53 
Spearman C The abilities of man New York Macmillan 1927 
Stalnaker J M The essav type of examination In E F Lindquist ed 
Educational measurement Washington D C American Council 
on Education 1951 

Stanley 3 C Reliability In R L Thorndike ed Educotioual measure 
merit 2nd ed Washington DC American Council on Education 
1971 Chap 13 

Stefilre B The reading difficulty of interest inventories Occupations 
1947 26 93 96 

Siew.art N AGCT scores of Army personnel grouped by occupation 
OceupohOMS 1947 26 S-41 

Stoddard G D The meaning of intelligence New York Macmillan 
1943 


Strang R How to report pupil progress Chicago Science Research 
Associates 1961 

Strong E K Jr Vocationol interests of men and women Stanford 
Calif Stanford University Press 1943 
Strong C-impbcll Interest Inventory The Stanford Calif Stanford 
University Press 1974 


Super D E el al Vocational development A framework fo 
New York Teachers College Columbia Unnersuv 
Publications 1937 ^ 


r research 
Bureau of 


YnrW^H o ° '’'’'”'“'""6 oocatiomi fitncss rev ed 

New York Harper i Row 1962 

P^'<:’‘<»'’Stcal tests Descriptions 
md^cl«sroom implKonons Spnngfield HI Ctatles C Thomas 


Temjan^Lr/" 

ThoniJ ihe R 1 . Persomet selection New York Wikv 1949 

resfmgVioLm, E 

19M inccton NJ Educational Testing Service 



References 517 


Thorndike R L and Hagen E Examiner s manual ^gmlne Abilities 
Tcsr (Mulli Leiel Edition) Boston Houghton Mifflin 1971 
Thurstone L L Primari iiiei.Ml <.M.«« (PS 5 chon,etnc Monograph 
No I ) Chicago Unit ersitj of Chirago Press 193S 

comjnent Prininr, 

Mental Abilities Manual of msirnctuins Chicago Science Re 
T '"p'w‘'rdc!aA W%V Test coordinator shaiidbool California 

rests Monteie^ Calif California Test Bureau/ 
McGraw Hill 1970 a „,itnber I CaUfoniia Achievement 

— st^tt-er^' airktrrtt Bureai/McGrau HU. .970 
^ ^hiectne lest In R L Thorndike ed 

Kuder Preference "f gXctino-M' «“ciirc;, 1944 37 538-44 
Blank for Men ^^jel New 

Tuckman B W Dejelopme Unitersity (US Office of Education 
Brunswick ^ 3 1957 (ERIC retrieval no ED 016 083) 

Final Report 0^6 In B W Tuckman & 

Thepsycholo^of lhc f“ disudvuiiwged New 

J L o Brian eds 

York nculums for occupational preparation and 

The study nJ Rutgers University (U S Office of 

education New Brunsv (ERIC retrieval no 

Education Final ivep 

ed 044 525) At.rntional research New York Harcourt Brace 

Condticung educaiw 

Jovanovich 1972 of Project Open Classroom Wayne 

Second year e Wajme Board of Education 1973 

N J Project ppUcation of psychological constructs In R 

___ — Teaching PanWgc points for study 2nd ed Phila 

H Hyman ed 

delphia Lippi*^^ n,erman M Beyond Pygmalion Galatea in the 
i,^on B W and ^ York meetm? of American Educa 


^ R W 

Tuckman d J'' 


schools Association 

tional rochran D Travers E Evaluating the open 

Tuckman B W Research and Development m Education 

classroom 7o« 

1974 8 14-iy Corman M N Measuring lA competencies A 
Tuckman B cooperative Tests of Industrial Arts Journal of 

r^:atmiifm‘‘^"‘’"‘''’ 



515 Referencts 


B W aud Eduards K I * 
desicn and management Educattaml Technolopr 
Tudtai R D IMeB.gearc InR L Ebd ed <=*' 

ca„oml research 4lhed New York Macmdlao 1969 654^4 
Tiler R W Educational eialuation New roles neiv means The sjxty 
’ eighlh yearboot of the National Society for ^ 

tionPartll Chicago The Unuersity of Chicago Press 1969 (a) 

Xhe purposes of assessment In W H Beatty ed Improving 

edncalional assessment and an tnvenfory of measures of affeciive 
behavior Washington DC Association for Supervision and Cur 
nculum Development NEA 1969 (b) , xr t 

Vargas J Writing iiort/iivhife beliavtoral objectives New lork 
Harper^ Row 1972 

Varon E J Development of Alfred Bmet s psychology Psychological 
Monographs 46 No 3 1935 

Walker D K Socioemotional meostires for preschool and kindergarten 
children A handbook SanFrancjsco JosseyBass Inc 1973 
Wallach M A and Kogan N Modes of thinking m young children A 
study of the creatnit>-ifite//ig€nce distinction New York Holt 
Rinehart and Winston 1965 

Webb E J ct al Unobtrusive measures Nonreactive research in the 
social sciences Chicago Rand McNally 1966 
Wechsler D The measurement of adult intelligence Baltimore Wil 
hams Wilkins 1944 

Wcick K E Systematic observational methods In G Lindzey and E 
Aronson eds Handbook of social psychology Vol 2 2nd ed Read 
ing Mass Addison Wesley 1968 357-451 
Wesman A G Writing the test item In R L Thorndike ed Educa 
uonal tneastireineni 2nd ed Washington D C American Council 
on Education Chap 4 

\Siikin H A Individual dtdercnces in ease of perception of embedded 
figures /oHnialo/Personahly 19a0 19 1-15 
Wilkin H A ct al Personality through perception An experimental 
and clinical siiidi NewYork HarperARow 1954 

Pjychological drffemjtiatiou NewYork Wiley 1962 

Wodike K H Some data on the reliability and validity of creativity 
tests at the clcmeniar) school level Educational and Psycholaeical 
Meastiretnetu 1964 24 399-408 

Womcr F B Test norms Their use and interpretation Washington 
ti National A sociaiion of Secondary School Pnncipals 1965 

^ N k Test use In R L Ebel ed Encyclopedia 

of edticatta^al research 4ih cd NewYork Macmillan 1969 1461- 


69 

Wood D 


Deielopment and interpretation of 
cctne\emcni tests Columbus Ohio Charles E Merrill I960 



Heferences 519 


Worthen, B R and Clark P M Toward an improved measure of re 
mote associational ability Journal of Educational Measurement, 
1971.5.113-23 

Worthen B R and Sanders J R Educational evaluation Theory and 
practice Worthington Ohio Charles A Jones 1972 
Wylie, R- C The self concept A critical survey of pertinent research 
literature Lincoln Neb University of Nebraska Press 1961 
Yamamoto, K, Creative writing and school achievement School and 
Society, 1963 91, 307-308 



index of authors' 


American Association for the Ad 
\-ancement of Science, 27, 28~ 
30 

Amencaji Educational Research 
Association, 249 

Amencan Psjchological Associa- 
tion, 249 

Amldon. E J . 202 
Ammons, M., 71 
Anastasi, A , i5d, 381 
Anastasiow, K , 404 
Anderson, H , 2d7 
Anderson, S B , 470 
Anghoff, W. H . 308 
Armstrong, R. J., 45 
Asch.S E,221 
Ashbum, R. R , 735 


Cattell.R B.439 
Chauncey, H , 17 
Churchman, C W , 2lff 
Clark, P.M, 354 
Clark. W.W. 377, 379 
Cochran, D , 438 
CofTman,W.E. 136.404 
Coleman, W , 382 
Comrey, A L , 439 
Corman, M N , 376 
Cox, R . 294 

Crites, J. 0 . 412. 420. 422 
Cronbach, L J . xvi, 242, 249, 254 
383.409 

Crumnne, W M , 414 
Crutchfield. R S.139 
Cureton.E E, 249. 382 


Backer. T. E . 439 
Backman, M , 354 
Baker, E . 225 
Ballachey, E L , 139 
Barclay, J. R . 17 
Barker, L , 45 
Barron, F , 357 
Barton, K , 234 
Bauemfemd, R H , 470 
Beatty. W H , 439 
Bennett. G. K , 238, 351 
Berg, H. D , 100 
Bierman. M , 423 
Binet, A xix, 313, 374-/6 
Blanton. W. 269, 404 
Bloom, B 

60 62. 64, 67, HI. 112n. 114, 

116, 117. 119. 338, 357 
Bommanto, J W . 439 
Bonjean.C M . 439 
Bonng, E G.319 
Boyd. R D . 202 
Boyer. E G . 202 
Bracht. G H , 339 
Brown, W , 172, 175 
Euros, O K, 50,341, 404, 437 

Campbell D P, 409, 413 
Carter, H D , 409 
Carver, R P,2« 

Cattell T McK.xir 


Davis, F B , 470 
D'Costa.A 0.415 
Delon, F. 0.404 
DeVault.M V.202 
Dewey, J , 141 
Dick. W. 249. 275 
Dobbin, J E , 17 
Doppelt, J E , 275 
Dunn.L M , 233. 330. 33/. 345 

Earle R B . 166 
Ebel, R L . 17, 225. 296. 378 
Educational Testing Service. 91, 
96 103, 120. 383. 404 
Edwards. A L . 147. 152. 166. 437 
Edwards. K J . 35 

Farr. R , 269 . 404 
Feuerstein. R , 321 
Fjndley, W C , 470 
Fitts, W H.423 
Flanders, N A . 202 
Freeman, F S , 324, 357 
Fry, E , 248rt 

Gagne R M , 24 43, 322, 323 
Galton, F„ xiv 
Gardner, E F . 308 
Gerberich, J R 107 
Gerhard, M , 45 
Getzels.J W,354 


Glaser, E M , 439 
Glaser. R_, 294, 470 
Goldman, L , 470 
Graham M , 52 
Granger, C H . 59 
Green, J A , 107, 136 
Greene. H A , 107 
Guilford, J P, 320-21, 325 353 

Hagerty, N , 249, 275 
Hagen, E , 340 
Harvey, 0 J , 116 
Hastings, J T , 24 
Hawes, G R . 357 
Heron, A . 414 
HilI.R J,439 
Hively, W , 217, 218. 219 
Hoepfner, R , 357, 437 
Holland,!, 419 
Hopkins, K D , 339 
Hough, J B.202 
Hunt, D E , 116 

Jnstructional Objectives Exchange 
54, 56, 58, 425, 427, 439 

Jackson.? W.3S4 
Jackson, R , 294 
Jacobson. L , 192. 193-94 322 
Jensen, A R . xvii, 322 355 
Johnson, 0 G , 439 
Johnson R A , 45 

Kast, F E , 45 

Katz. M , 404 

Kibler, R J , 45 

Klein S.3S7 

Kogan, N , 119 354 

Kohlberg, L , 142 

Krathwohl,/? R.4} 42-43,67 71 

Krech, D . 139 

Kriege, J W , 35 

Lake. D G,166 
Lennon, R T , 225, 308 
Levine, A S , 382 
Lien, A J , 202 
Light, R J , 248 
Lord, F M , 268 


. I .. to figures and boxes, n refers to footnotes 

• Page numbers m Italics reter xo 


521 


index of subjects and tests’ 


Ability tests, continuum of, 382 
Accommodation, 321 
Achievement' 
language arts, 365-66 
listening, 372-73 
mathematics, 367-68 See also 
Mathematics 

reading, 363-65 See also Reading 
relation to intelligence, 324-25, 
381-53 

science, 370-72 
social studies, 368-70 
study skills, 371-72 
tests, 363-404 See also Achieve 
ment tests 

Achievement tests, 363-404 
administration, 383-85 
applications, 445-65 
criterion referenced, 391-99 
interpretation, 385-99 
items, 363-73 
new trends in, 402 
properties 375-85 
relation to IQ tests 381-83 
standardized 363-75 
teacher-built vs standardized 

375-81 . ^ 

Acquiescence response bias, 2Zi 
248 

Action %erbs 

illustrations of, 2fr-JU 
Adaptability. 318-19 
Adjective checklist, 145, 162-63 
Adjustment, 432-33 
AdU.strat.on of tests 265, 268- 
69, 292, 303, 383-85 

Affective domain 

measurement of, 64-67, 139 -n> 
423 

taxonomy of, 41-43 
AS"at.o„,«.14MM38 
Affective measures, 409-39 

S'^afso^^Attiludes: Attitude 

Altem?i?forms reliability, 261 
262 

Analysis 

• Page number, m ifbe, refer to Sgures 


measurement of, 1I4-I6, 123-24 
taxonomizing of, 39, 41 
See also Cognitive domain 
Antiapated achievement score 
390 

Application 

measurement of, 111-14 
taxonomizing of, 39, 41 
See also Cognitive domain 
Appropriateness II, 211-24 230 
304, 375-78, 449 
Aptitude, 235, 236-38, 324-25 
Assimilation, 321 
Attitudes 

measurement of, 139-66 422-28 
toward role of women, 153-55 
toward school. 149, 427-28 
coward self, /62-63, 422-26 
See also Attitude scales 
Attitude scales. 143-66 
adjective checklist, 145, 162-63 
attitude statements for, 147-53 
bipolar adjective scale. 145-46 
157-62 

illustrations of. 154-55, 159, 160- 
62, 162-63, 164 
Likert, 144. 153-57 
two-point, 144-45, 153-57 
use of, 164-66 438-39 
See also Attitudes 
Attitude statements 
writing of. 147-53 
Attitude tests See Attitudes. At 
titude scales 

Behavior 

measurement of 188-2UI 
See also Behavior scales 
Behavioral objective See Objec 
fives 

Behavior scales. 

cnnstniction of. 96-W 

illustrations, 19!. 

198, 200-01 
use of, 199-201 

Bender Gestalt Test, 3 

Bmet Simon Intelligence Scale. 
sle*also Stanford Binet Intelli 


and boxes. 


n refers to footnotes 


gence Scale 

Bipolar adjective scale, 145-46 
157-62 

BITCH Test, 222 

Bloom’s taxonomy. See Taxonomy 


California Achievement Tests, 
363 m, 376. 377. 379, 390h. 402 
California Psychological Inven 
tory, 429-31 

California Test of Mental Matur 
ity, 212, 339 346 

California Test of Personality 
432-33 
Career 

education, 409, 419, 420 422, 438 
matunty, 420-22 
orientation See Interest inven 
tones 

Career Matunty Inventory 42&-22 
Certifying student progress Set 
Student progress 
Checklist See Adjective checklist 
Performance checklist 
Chronological age, 332-33 
Class 

analysis 387 
summary. 456 
Classroom 

achievement monitoring 455 58 
applications of test data 454-58 
Classroom observation scale 
200-01, 468 

See also Behavior scales 
Cognitive Abilities Test 340 347- 


Cognitive domain 
measurement of, 60-64 77-106 
111-36 

taxonomy of, 38-41 
verbs, for, 26-27 
See also Analysis, Application, 
Comprehension, Evaluition 
Knowledge, Synthesis 
College Entrance Examination 
Boards, 282 283 321 
Completion item, 79-82 106 
Comprehension 

measurement of. 77-106 


523 



Ill Index ot Authors 


I\TTian.n B,30? 

Madaus 0 T . 24 

Mager. R F . 24 45 70, 71 
Malcolm D D , 412 
Mallmwo C C , 414 
Masla.B B,41.4.Mj 67 
MvAahan It H ,45 
McClelland D,»ix,357 
McLemorc S D.439 
McSenaar Q 357 
Michael W B . 17 
Mi’« D,45 
Miles M B If* 

Milter D C 1« 

Millman J 22S 
M>cn S S 4M 

N'elson M J 3S7 
Silko A J 470 


Osgood C f. 145 157 158 Ifd 

Page. E D IV 
PajTse DA 17 71 
Pemberton C 138 
Psagel J 321 
Porham \V J 37 225 

Robinson,! P 16* 

Rosen P 439 
Rosenberg N 417 


Roscnlhal R . 192. 322 

Rosermsejg J E 45 

Sanders. J R . 17 
SaUlcr.I hi .543 
Scheicr, I H . 107 
Schoer. L A . 107 
Schroder, H M.1I6 
Seashore. H C 233. 351 
Sesbel, D W 404 
Shaser. P R,166 
Shass.M E.ie* 

Simon, A, 202 
Simon T . 3l4-!6 
Solomon R J . 116 
Spearman, C . 316, 3/7, 318 
Stalnaicer, S M , 136 
Sianle>,J C,275 
Stefflre, B . 412 
Stessan. N . 341 
Stoddard. G D.3I8 
Strang R,470 
Strong E K. 409 412 
Suet G J. 14$. 157. 158 160 
Super D E 422 439 

Tannenbaum P H 145. 157 jsi 
160 

Tarctan C 357 

Terman L.xv,332 

Thorndike R. l 262, 26). 275 


Thurstone, L L , 139, 317-18 
Tiegs.E W, 377. 379 
Tijvkelman, S N ,71 
Travers, E . 428 
Triggs, F 0 , 417 

Tuckman B W , 35, 86n. I6O11, 172. 

173, ;9S, 354. 376, 423, 428 
Tuddenham R D . 357 
Tuinman, J. J . 269 , 404 
Tyler. R W , 470 

Vargas J S , 45, 294 
Varon, E J , 315 

Wahl.N K,470 
Walker, D K , 439 
Wallach.M A, 119,354 
Warburton, F W , 459 
Webb, E J.428 
Wechsler, D , xvi, 318 
Weick, K E , 202 
Wesman A G 107.238,351 
Williams, R L , 222 
Winefordner. D W,417 
Witkjns, H A . 436, 437. 438 
Wodtke, K H . 354 
Women, F B , 308, 470 
Wood.D A, 107 
Worthen, B R, 17,354 
V'nght. J M . 166 
Wyhe. R C , 439 

Yamamoto, K .354 



index of subjects and tests 


Ability tests, continuum of 3S2 
Accommodation. 321 
Achiescmcnf 
lantniagc arts, 365-<6 
listening. 372-73 
mathematics, 367-<^ See also 
Mathematics 

readme 363^ See also Readme 
relation to intelligence. 324-ia 
3S1-83 

science 370-72 
social studies, 36S-iO 
stud> skills. 371-72 
tests. 363-4W See also Achasc- 
ment tests 
Achicsemcnt tests 
administration. 383-85 

applications. 445^ 
entenonreferen^ 391 99 

mterpre*®*^®”' 385-99 
Items, 363-73 
nets trends 402 
properties. 375-85 
relation to IQ loits. 3S1-S3 
Standardized 36>75 . . 

teacher built ss standardized 

AcquSence response bias 221. 

248 

Action Serbs 

illustrations of' 

Adaptability. 318-i9 ^^^-63 

Adjectisc checklist. 

Adjustment, 432-53 ^5 268- 

Administration of usts 

69. 292, 303 383-85 

Affectne domain ^ j 3 ^ 

measurement of. 

423 

taxonomy of. 41-43 
xerbsfor.27 14142 438 

Affecti%e education 

Affectne measures. 

S oio 

Al.eS«forms rel.abd.ty, 261 
262 

Analysis 

.p,rr number. 


measurement of, 114-16 123-24 
taxonomizing of, 39, 41 
See also Cognitne domain 
Anticipated achiesement score 

390 

Application 

measurement of 111-14 
laxonomizing of. 39. 41 
See also Cognitne domain 
Appropriateness. 11 211-24 230 
3at. 375-78 449 
Aptitude. 235. 236-38 324-25 

Assimilation, 321 

"i;SSremrntof,139^ 42-2* 

lonard self. 42-26 

See also Altitude scales 
Attitude $cal«. 
adjectue checklist. 145 !62^3 
attitude statements for. 14W3 
bipolar adjectite scale. !45-t6 

illustrations of. IS4-5S, 159, 

62. 162-^1 
Likert, 144. 153-CT 
two-point, 144-45, 153-57 
use of. 164-66. 438-39 
See also Altitudes 

Altitude statements 

\vntingof 147-53 
Attitude tests See Attitudes At 
titude scales 

^m«surement of, 188-mi 
See also Behavior scales 
Behavioral objective See Objec 

Beha^or scales 

construction ;95 

illustrations 191. • 

\98.200-01 
use of. 199-201 

Bender Gestalt Test, Cf-ile 

Bmet Simon Intelligence Scale 

si'tto StrmforfBrnut Intdb 


gence Scale 

Bipolar adjective scale 145-46 
157-62 

BITCH Test, 222 

Bloom’s taxonomj. See Taxonomj 

California Achievement Tests 
363rt. 376 377. 379. 390m 402 
California Ps>chological Inven 
tory, 429-31 

California Test of Mental Matur 
ity. 212 339 346 

California Test of Personahtj 
432-33 

^^ducation 409 419 420 422. 438 
maturity, 420-22 
orientation See Interest inven 

Career Maturity Inventory 42C^ 
Certifying student progress See 
Student progress 
Checklist See Adjective checklist 
Performance checkbst 
Chronological age 332-33 
Class 

analysis 3S7 
summary, 4S6 

Classroom „ 

achievement monitoring 4^58 
appi, callous of test data 454-55 
Classroom obsen-ation scale 
200-4)1, 468 

See ako Behavior scales 

Cognitive Abilities Test 340 347 
48 

Coguitite domain 

measurement of to-en 

111-36 

taxonomy of, 38-41 
verbs lor, 26-27 
See also Analysis „ 

Comprehension Evalua 
Knowledge, Synthesis 
College Entrance 

Boards 282 2S3 321 
Completion item 79-8- 106 
Comprehension 

measurement of 77-106 


and boxes. 


I refers to footnotes 



524 Index of Subjects and Teat* 


tMonommfig of, 41 
See elu3 Cogrutiv-e domain 
Comrrelwsi'e Tests of Basic 
Slcilli, 363n, 39<>i 

Concurrent «lidu). 2J0, B2-W. 

Construct talidity. ZJ5 235-33 245 
Content map, 213. 215 
See also Ccntoil outline 
Content outline, 67-69. 149, 155-54 
213.377 

See aUo Content map 
Content %-a!i<Sily See Appropnatc 
ness 

Coopcraiisc School and College 
Ability Tests 345 
Correlation eoePldent, 232-34 
23f. 237-3J. 245rt. 77 Im. 272.i 
3W.343 

TTlaiion to causation Z3W0 
322 

Creaiis-it> 
tests ot, 353'54 
Criterion keying 410 415 
Criterion referenced tests 391-94 
443 44W7. 4SS. 46S 
Cnterionttlfrencing. lln \3 2T} 
293-97. 299. iM.06 391-99 
Criterion salidity. 230. 240-42 244 
245-47 
Culture 
bias 223 
rtfynencc 222 

Culture Fair Scales of InteSli 
tence 234 349- V) 


Data processing petforfnance lest 

m 

Ojsu Rca^lmg Test 400 
rVslation 10 2JJ 313-34 
See a'to Intelligence quoueni 
Menial age 


Educational 
cfTectiseaess, 459-69 
goals, 466-67 
input 459, 463, 465-67 
output , 459, 460, 46/-6S 
process, 459, 460, 461, 465-69 
Edumetnc dimension, 210, 239 , 325 
Edu-ards Personal Preference 
Schedule, 5, 433-14 
Effeciiscness of education, 459-69 
Electronics perfonnance test 
275-76 241-45 

Embedded Figures Test, 436-38 
Enabling objectnes, 33-14 
Essay item, 60 111-36 
construction of, 111-24 
reliability of scoring. 131-34 
735. 273-74 

sconng of. 124-31, 732-34 
Evaluation 

measurenient of. 119-24 
of educational dTeciiveness 
459-69 

o( instruction, 30301, 457-58 
of student performance. 300 
taronomizing of. 39, 41 
See also Cogniiue domain 


Face salidity 223 
Faking 220 412, 414 423 433 
Feedback See Student feedback 
Tield-depcndence 438 
Field independence 438 
Food processing performance 
test (74-75 
Functions of testing 
classroom 454-56 
IndiTsdual 445-54 
ssstem 459-69 


Dtagoosis 


of strenjiJs and uraWne 
4»VS2 

of itadent dcfcictynn Wl 
Dariosiic leiiing JJ jis « 
Srr e'to Dufnoin 
Dufi^ry cf (kcurei unjJ ; 

tPOD 417 419 
D"nm!lal Apiitu'e Tmn 
2’^n 2tl sn js; 
rMioriKTo of rrsprriK Sfi 


DstnSjibr, cj ,,, 

D-vrj nrTffrmri-, 217-19 


^’^dc^-^unaleni score 239-92 
Tnu 

*00-01 

E*aminai.on 


llen^Sclton Tests of Mem 
Abthu 348-49 

Higher ciognitue processes 
measuremcni of «3^ in x 
Anais, 

4^itne domain Esalv 
lion Synthesis 


Indisidual 

applications of test data, 445-54 
diagnosis, 449-52 
prescription, 452 
test record. 2S6, 3S6, 451 
Indisidiially Prescribed Instruc 
tton, 376 

Individual Pupil Monitoring Sys 
stem. 392-94. 395 
Input, 459. 461, 465-61 
Instructional 
evaluation, 457-53 
objective Sec Objectives 
prescnption, 4S2 

ImtructionalObjectives Exchange, 
37 

Intelligence. 312-57 
abstract, 322 
associative, 522 

cultural influence on, 222, 354-57 
definitions of. 313-25 
factors of, 318,320 
general, 313-17 
issues in testing, 354-57 
Items to measure, 325-29 
properties of tests of, 334-41 
reporting of, 335 
scores of. 329-34. 335-37 
stability of, 335-39 
tests of 341-52 
See also Mental ability 
Intelligence quotient, 332-34, 336- 
37 

See also Deviation IQ, Mental 
age 

Interest Imentones. 4<»-20 
Kuder, 412-17. 418 419 
Ohio. 417-19 

SeU-direcied Search 419-20 
Strong 409-12,413. 414.415 416. 
417 418 

Interest level, 267 

Ini^reiabihti, 279-301, 306. 385- 


entenon referenced Iln 13 279 
293-97, 299, 304-06. 39U99’ 
determination of, 298-301 
"“^referenced. Hk. i3_ 279. 
93 301 385-91 

tests. 379-81 


Interpretation of test scores 
Interpretabihty 
IniCTrater reliabiliiy, 131-34 


See 

735 


Interval scale 128-30 

lovvu Silent Reading Tests, 401-02 



Index of Subjects and Tests 525 


Io\s'a Tc5l^ of Basic Skith, 340 
3&3m. SSO. 3?3 
Ipsathv measure, 434ri 
Item* 

anaJ>s!«, 271-72, 373 3W 
clutter lalKmj:. 415 
nie. 443-44 
form, 2!S 

See also Short-antsser Hen 
Cttit Hem 

Jur>. 247-48 

Ste otso Panel of etperis 

Knowledpc^ 
measurement of. 77-106 
laxonomUingof. 3S 37,40 
See also Cognitis-e domain 
Kuder Form DD Occupational In 
terest Sunej, 412« 

Kuder Form E General Interest 
Surt-e>. 412-17. 41!. 419. 434u 
Kuder Richardson formula 21 re- 
llabilH>. 25«-59. 262 

Language arts. 365-W. 377 
Learning nbililj . 321-22 
Ukert scale, 144, IS3-S7 

Lorge-Thomdikc Intelligence 

Tests, 339. 347. 383 

.Matching item, 100-W, 106 
Mathematics 
achlcscmcnt, 367-68, 4oZ 
performance, PS 
score report, 396-97, 

393-99 

Maturity Index. 190. 19! 
Measurable objectUc See Objec 
tl\-es 

Median, 2S7 
Mental abiht> 
composite of, 323 
primary, 318 
See also Intelligent 
Mental age. 329-31. ,11, 

See also Deviation IQ. I"'"' 

gence quotient y.^rbook. 

Afenta! Measurements F« 

5. 10.341.402,452rt 
Metropolitan Achiescme 

23^ 236-38, 289. 290 363n, 
Miller Analogies Test, zoi 
Monitonng 455-58 

classroom nchieveme , 

student progress, 44^ 

MuUiple-cho.ee item 90-100 


Na\j General Classification Test 
382 

Needs, 433-34 

Nelson Denny Reading Test. 269 
404-05 

Nominal scale, 131 
Nomination* procedure 146 163- 

Normal curse See Normal distri 
bulion 

Normal distribution. 258, 281, 282. 

2S3.2M 

Normatise scale. 4l4n 

Normflng) group. 13. 280-81 
Norm referenced tests See 
Achlesement tests. Standard 
izei tests. Norm referencing 
Norm referencing. 

Ml 374 379-81. 38S-91. 393 
Norm*: W. 280. 2M. 292. 293 33/. 
301. 374. 379-81. 38S-91. 393 

action part, 

classification of. M. 22. 3W5 
conditions P»rt;“:’y 
cnicria part. 32-33. 10. 216-17 
definition of, 24 

enabling. 33-34 

"Sr.t3m|-5«. 32 
5 - 55 . 57. 134. 181. 189 

relation to outcomes See Ap 

propnalcncss 

terminal, 33-34 
testing for. 49-71 
unit. 33-34 

Ohio'vwational Interest Survey 
ogen ctassi^nj^’ 428 

Otrs’SKalAbtlttyTes. 

0„s%uickSconng Intelligence 

Out?u?V»% 46/05 

S?a'Je‘;S“’reSU 23«0 

FeaWy Picture 

233 330 331, 345-46 

percentile 

285-89 291, 385, 388 


Perception, 436-38 
Performance checklist 172-75 
185-88 

illustrations of, 173 174-75, 185 
186 /87 

scoring, 186-88 
See also Performance test 
Performance objectne See Ob 
jectives 

Performance test, 172-88 
considerations for choosing 
179-80 

construction of 180-86 
illustrations of, 175-76 PS- 
Ps 178 

scoring 186-88 

Personality orientation 429-38 
Personal Orientation Inventory 
437 

Pht Della Kappan goals, 466-67 
Physical education performance 
test, 186, 187 

Piers Hams Childrens Self-con 
cept scale 425 

pSctive validity 230. 233-40 245 
Preparing Instructional Obiec 
fives, 70 

Prescriptive Matheraatics Inven 
lory. 392 394 396-97 435 

Abtlt-tc. Test 

ProSs. 439 460 467 46W9 
Program See System „ on * 

Progressive Matrices Sec Ra%ens 

Progressive Matrices 
Psychometric dimension 210 la 

Pyg™lioi. lu lltc Classroom 192 
194 

Su^IQ Sre'inlelligence <l“o"“J 

RavausProgresstveytees^^^ 

Raw score. 279 281 car 
336. 337. 385 

■'“Semen. 235 B6 363.65 
level 248 267. 303-04 
objectives 395 
scores 387 

tests, 400-W 4 

Reliability, 11 131-35 253 

^06 ... -yf,-, 

alternate forms 261 20i 
building It in 266-69 



Index of Subjects and Tests 525 


luua Tests uf Ua^tc SVilU, 5'0, 
563n, iSJ. j;J 
IputK'c measure, -t^'i 
Item: 

analj sis. 271-72, 37^. 3>J 
cluster laU>intf. 415 
file, +43-44 
fonn, 2IS 

S<e dio Short-ansvver item. 
Essay item 

Jury. :47-4S 

See dJso Panel ol esperts 
Knoulcdccf 

measurement of, 77-106 ^ 
taxonomuins of, 33. 3?, 40 
See aliO Cognitive domain 
Kuder form DD Occupational In 
terest Sunc). 412rt 
Kuder rorm E General Interest 

Sursry, 412-17. 418. 419, 4^1 
KuderRtdurdson fonnulaJl re- 
liability. :56-39. 262 

Language arts. 345-4^. 377 
Lcaming ability, J2I-22 
Ukert scale. 144, 153-^ 
Lorsc-Thorndikc Intclligsocc 

Tests. 339. 347, 333 

Matching item, 100-W, i06 
Mathematics: 
aclilcscmcnl, 367-68. 462 
performance, /73 
score report. J9d*97. 

Maturity Index. 190 . 191 
Measurable objectisc. See 
tiscs 

.Median, 287 
Mental ability, 
composite of, 323 
primary. 318 
See dso Intelligent 
.Mental age, 329-31, IntcHi 

See also Dcsiation IQ. 

gence quotient ^ 

Menial Meastirenienis rtrfi 
5,10.341.402.452rt 
Metropolitan Achicscm ^ 

236. 236-38, 289. 290. 363«, 
Miller Analogies Test. 
Monitoring: 455-58 

classroom 

student progress, 44^ ^ 

Mult.ple-thoice item. 90-1^. 


.Saiy General Classification Test. 
382 

iS'ccds, 433-34 

Nelson Denny Reading Test, 269, 
404-05 

Nominal scale, 131 
Nominations procedure. 146, 163- 

M 

Normal curse. See Normal dislri 
butlon 

Normal distribution, 258, 281, 282, 
283.284 

.Norroatise scale. 414rt 
Norm(lng) group, 13. 280-81 
Norm referenced tests. See 
Achlcscmcni tests. Sundard 
ited tests: Norm referencing 
Norm rtfcrencins. 

VII 374 379-81.385-91.393 
Norms 279. 260.283. 292. 293. 33/. 
301. 374. 379-81, 385-91. 393 

Objccliscs. 20^* ^ 

action part. 26-31. 216 
classification of. 26. 27. 3^5 
conditions $iti7 

criteria part. 32-33. 70. 216-17 
definition of. 24 

"ScrofStS; 1-50, 52, 
jS. 57, 134. 181. 189 
relation to outcomes See Ap- 
propriateness 

terminal. 33-W 

testing for, 49-71 
unit, 33-34 

Ohio'voaCioMl Interest Survey. 
Open di^rwin^O. 42S 
“r't" CasstSentton 
Ons'S-MillAbUttyTest, 
Ott.’^QnickSnorinE InKll.genre 

Test, 382 

Output, -.SP, 

K“M.SJ''reSb-'.t,. 250-60 
233. 330. 331. 345-46 

Percentde^k s^. 283. 

285-89, 291, 3», 388 


Perception, 436-38 
Performance checklist, 172-75, 
185-88 

illustrations of, 173, 174-75, 185, 
166. /87 

scoring, 186-88 
See also Performance test 
Performance objcctne See Ob 
jcclivcs 

Performance test, 172-88 
considerations for choosing, 
179-80 

construction of, 180-86 
illustrations of, 175-76 176- 
77. m 

sconng, 186-88 

Personality orientation 429-38 
Personal Orientation Inventory, 
437 

pin Delia Kappaii goals, 466-67 
Physical education performance 

test, 186, /87 

Piers Hams Childrens Self-con 
cept scale 425 

Predlcine validity 230. 

Preparing Instructional Objec 
fives, 70 , „ 

PrMcriptn. 

tory. 392. 394 396-97. 455 

Abtltt... T..t 

318 349 

Process. <59, 460 461 465-69 

JSS,.?MfSc=?SreR.tcn. 

Progressive Matrices 

Psychometnc dimension 210 
pJmUon V, the O.ssrcom 192 
194 

Random order, 86n ^„„,,ent 

Ratio IQ 5 ee Intelligence 

Rurens M m 331 

Raw score, 279, 2Bl eof, 

336, 337. 385 

Reading _ 3g3_65 

achievement, 235, •«'>. 
level, 248, 267, 303-04 
objectives, 395 
scores, 387 

tests. 400-03 253-74, 300 m 

Reliability, 11, 131-35 25S- 

alternate forms. 261 262 
building It in 266-69 



526 Index of Subjects and Tests 


dcfui lion of 11 2i> 
improvmj 2o9-74 
inter raler (of scoring) 131 34 
/i5 273-74 

Kudcr Rjchardson formula 21 
2^'9 ::62 

of achinement tests 3 8-79 
of inicll *ence tests 338 
para Id item 2 j 9-«) rO-7I 
split half TftWI 262 
test retest 261-62 
See also Sundard error of 
measurement VanabiUty m 
test scores 

Remote Assoaaics Test 354 
Response set 156 22i3-23 
Response bias See Response set 
Remand Frame Test 438 
Roster 446 

Scale of \alucs 435-36 
Scales See Attitude scales Be- 
hasiorscaUs Scoring scales 

^llcrplol iU 2J! 21,1 
^hool Saitiment Inde* S 427-28 

111™!“'' I” 

aehicsenieni 3 0-72 
perforrunee r6-77 
Score 

«<'C Wlcd achittcmeni 290 
Sradc couhjIctii 239,92 J29.9- 
lien performance 290-91 

2’' 

r9 221 2U 288 285 
"2X99 relereneej 

"j^noleU,. S„Cla.,.„„ 

»>andjrj _s0 

I'-e .8, 22.^9 » 

62 ^ Ictcnial ,y_ 

' IQUit 


completion 79-62 
matching 10(MH 
mult pie choice 90-100 
oihcr tsstKdioice classification 
66-90 

true false 82-66 
unstnictured 76-79 
Short form Test of Academic Ap 
titude 334 335-37 346-47 390rt 
Skills and competencies 
measurement of 172-66 
Skills Monitoring System 394 
454 

Social desirability bias 220 423 
Social studies 
achimement 368-70 
content map 2/3 215 
Spearman Broun formula 261 
Split half rchabitiiy 260-61 262 
Standard deiiation 282 28J 

deviation umt 282 263 

Slanted eereir of m«su,eme„, 

Standardized tesu 292 93 447_4s 
*35 462-64 See 
^evement tests standard 

Sundard score 280 281-64 291 

achievement 390 
r score 282 283 264 
Js«re 282 283 264 

. » *>2 «; c« JO, "J- 

X-OIPP 

""“wT" ® M 58, 

Suic assessment 402 
Student 

feedback 4a3-S| 
progress 445-t9 

Si'uX.'.'Si'L ® ® «< 

^■i;;».Sc„Di.cip,Sl.,c ,« 

■>' I16-.9 ,a,,j 


taxonomizing of 39 41 
See also Cognitive domain 
Systemfs) 

applications of test data 439-69 
approach 21 24 44 
testing program 468 

Taxonomy 

of affective domain 27 41-13 
64-67 

of cognitive domain 26-27 38- 
41 60-64 

See also Affective domain Cog 
nitive domain 

Teacher built tests See Cntenon 
referenced tests 

Tennessee Self-concept Scale 
423-2S 

Terminal objectives 33-34 
Test 

administration See Admimsira 
tion of tests 
construction 15 77 202 
evaluation 16 207-307 
lira See Short ansMpr 
Hem Essay item 

61-62 63-64 66-67 
“ore See Score 
seJeciion 304^ 
standardized 292 93 
taxonomies 60-67 
use 16 

21112 

Testmr ^ ^ 

case studies of 3-7 
m relation to teaching 220 224 
program 443-70 
purposes of 14-15 +44-15 
reasons for 7 10 
tesforleachw 43-4j 

TerfrX". '“liiii 
"''Utility 261-62 

Cognitive doman 
Torrancd. -“^nitive processes 
54 ® of Creativity 353- 

Trvjc-faisc ,tcm 82-66 m 
r score 282 283 284^ 

Tuckman Teacher Feedback 
Form /60-<52 rceuback 

fe'o'fh lien, 

teion.JM ’’"'“''‘‘“"I 



Index of Subjects and Tests 527 


Tuo-poinl attitude scale, 144-15, 
153-57 


Unstructured short answer item 
74-79. 106 

Usabilit>. 302-04, 30tf. 3S1 


Validit>. 11. 229-49. 305 
concurrent, 230, 232-34, 235, 243- 


44 

construct, 230, 235-38, 245 
content See Appropriateness 
enttnon, 230, 240-42, 244, 245-47 
dcAnilion of. II. 229-31 
face. 223 

juncs. 241, 247-48 
of achicscmcnt tests, 378 
of intelligence tests, 339-41 
predtetive. 230, 238-40, 245 


Values 435-36 
Variability in test scores 
sources of, 262-66, 267 

Wcchsler Intelligence Scale for 
Children. 233 234 282 2S3, 
344-45, 346 

Z score, 282, 283 284 



